Skip to content

Parser adversarial review: strict VERT metadata, surfaced partial-load warnings#32

Merged
jacobson30-bot merged 1 commit into
mainfrom
parser-adversarial-review
Jun 11, 2026
Merged

Parser adversarial review: strict VERT metadata, surfaced partial-load warnings#32
jacobson30-bot merged 1 commit into
mainfrom
parser-adversarial-review

Conversation

@jacobson30-bot

Copy link
Copy Markdown
Contributor

Summary

Adversarial pass over the raw file decoders (io/readers/) using mutated real fixtures: truncation at arbitrary points, corrupted rows, hidden header keys, partial files.

Findings (fixed)

[WARN] Createc VERT metadata summary was lenient where the full parse is strict — deprecated np.fromstring silently stops at the first bad token, so a corrupt/truncated table summarised as healthy in browse (and the metadata cache) while the viewer's full parse raised. The summary now validates per-row column counts and floats exactly like the full parse. The old test pinning the leniency now pins the symmetric contract.

[WARN] scan.warnings was recorded and then dropped — SXM/SM4 readers handle partial files correctly (complete planes only, with explanations like "file may have been incompletely written, missing planes: 3"), but nothing consumed the explanation; users saw missing channels with no reason. ViewerScanData now carries the warnings and the viewer shows them at load.

[INFO] Nanonis spec single-column files failed the column-count check (1-D loadtxt result reshaped to one wide row); disambiguated via the header length.

Verified sound (now pinned)

Createc DAT's excellent truncation diagnostics and dimension guards; Nanonis spec's strict symmetric failures; SXM/SM4 partial-file tolerance (complete, fully finite planes + warnings); VERT partial-sweep row-count warnings; chunked header reads across the 64 KiB boundary. Corpus sweep: metadata ↔ full-parse agreement asserted over every real fixture in test_data. Disproved: VERT Vpoint threshold units (mV, consistent).

Test plan

  • tests/test_parser_adversarial.py: 36 tests over mutated real fixtures.
  • Full suite green locally: 2435 passed, 3 skipped.

🤖 Generated with Claude Code

…rnings

Adversarial pass over io/readers (truncation, corruption, partial files),
pinned by the new tests/test_parser_adversarial.py corpus tests:

- Createc VERT metadata fast path used the lenient (and deprecated)
  np.fromstring, which silently stops at the first unparsable token: a
  corrupt or mid-row-truncated table summarised as healthy (full row count,
  no warnings) while the full parse raised — so browse and the metadata
  cache showed a spectrum the viewer could not load. The summary now
  validates rows exactly as strictly as the full parse (consistent column
  count, every token a float), matching the Nanonis reader's existing
  strict summary. The old test that pinned the leniency is updated to pin
  the symmetric contract instead.

- scan.warnings was consumed nowhere: the SXM/SM4 readers degrade
  gracefully on partial files (a scan still being written loads its
  complete planes only) and record exactly why — and the explanation was
  then dropped, leaving the user with silently missing channels.
  ViewerScanData now carries scan_warnings and the viewer shows them in the
  status bar at load.

- Nanonis spec: a valid single-column file failed the column-count check
  (loadtxt collapses one column to 1-D and it was reshaped to one wide
  row); the column-header length now disambiguates.

Verified sound, now pinned by corpus tests across every real fixture:
metadata <-> full-parse agreement for VERT / Nanonis spec / Createc DAT;
Createc DAT truncation diagnostics ("corrupt or truncated", byte counts,
format token) and dimension guards; Nanonis spec strict two-path failures;
SXM/SM4 partial-file tolerance returning only complete, fully-finite
planes with explicit warnings; VERT row-boundary truncation warning
(partial sweeps); 64 KiB chunk-boundary DATA-marker handling in the
streaming header reader. Disproved during review: the VERT Vpoint
time-trace threshold units (Vpoint.V is in mV, consistent with the data
column).

Co-Authored-By: Claude Fable 5 <[email protected]>
@jacobson30-bot jacobson30-bot merged commit dc41450 into main Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant