Parser adversarial review: strict VERT metadata, surfaced partial-load warnings#32
Merged
Merged
Conversation
…rnings
Adversarial pass over io/readers (truncation, corruption, partial files),
pinned by the new tests/test_parser_adversarial.py corpus tests:
- Createc VERT metadata fast path used the lenient (and deprecated)
np.fromstring, which silently stops at the first unparsable token: a
corrupt or mid-row-truncated table summarised as healthy (full row count,
no warnings) while the full parse raised — so browse and the metadata
cache showed a spectrum the viewer could not load. The summary now
validates rows exactly as strictly as the full parse (consistent column
count, every token a float), matching the Nanonis reader's existing
strict summary. The old test that pinned the leniency is updated to pin
the symmetric contract instead.
- scan.warnings was consumed nowhere: the SXM/SM4 readers degrade
gracefully on partial files (a scan still being written loads its
complete planes only) and record exactly why — and the explanation was
then dropped, leaving the user with silently missing channels.
ViewerScanData now carries scan_warnings and the viewer shows them in the
status bar at load.
- Nanonis spec: a valid single-column file failed the column-count check
(loadtxt collapses one column to 1-D and it was reshaped to one wide
row); the column-header length now disambiguates.
Verified sound, now pinned by corpus tests across every real fixture:
metadata <-> full-parse agreement for VERT / Nanonis spec / Createc DAT;
Createc DAT truncation diagnostics ("corrupt or truncated", byte counts,
format token) and dimension guards; Nanonis spec strict two-path failures;
SXM/SM4 partial-file tolerance returning only complete, fully-finite
planes with explicit warnings; VERT row-boundary truncation warning
(partial sweeps); 64 KiB chunk-boundary DATA-marker handling in the
streaming header reader. Disproved during review: the VERT Vpoint
time-trace threshold units (Vpoint.V is in mV, consistent with the data
column).
Co-Authored-By: Claude Fable 5 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adversarial pass over the raw file decoders (
io/readers/) using mutated real fixtures: truncation at arbitrary points, corrupted rows, hidden header keys, partial files.Findings (fixed)
[WARN] Createc VERT metadata summary was lenient where the full parse is strict — deprecated
np.fromstringsilently stops at the first bad token, so a corrupt/truncated table summarised as healthy in browse (and the metadata cache) while the viewer's full parse raised. The summary now validates per-row column counts and floats exactly like the full parse. The old test pinning the leniency now pins the symmetric contract.[WARN]
scan.warningswas recorded and then dropped — SXM/SM4 readers handle partial files correctly (complete planes only, with explanations like "file may have been incompletely written, missing planes: 3"), but nothing consumed the explanation; users saw missing channels with no reason.ViewerScanDatanow carries the warnings and the viewer shows them at load.[INFO] Nanonis spec single-column files failed the column-count check (1-D loadtxt result reshaped to one wide row); disambiguated via the header length.
Verified sound (now pinned)
Createc DAT's excellent truncation diagnostics and dimension guards; Nanonis spec's strict symmetric failures; SXM/SM4 partial-file tolerance (complete, fully finite planes + warnings); VERT partial-sweep row-count warnings; chunked header reads across the 64 KiB boundary. Corpus sweep: metadata ↔ full-parse agreement asserted over every real fixture in
test_data. Disproved: VERT Vpoint threshold units (mV, consistent).Test plan
tests/test_parser_adversarial.py: 36 tests over mutated real fixtures.🤖 Generated with Claude Code