Skip to content

health skill: schema check produces BOM false positives (head -1 | grep '^---$' fails for UTF-8-BOM files) #46

@flyawayplus

Description

@flyawayplus

Summary

skills/health/SKILL.md Category 1 (Schema Compliance) uses a bash recipe that incorrectly flags UTF-8-BOM-prefixed Markdown files as missing YAML frontmatter. In a vault with BOM-prefixed notes (common on Windows authoring stacks), this produces large numbers of false positives, drowning real schema issues in noise.

Affected file

skills/health/SKILL.md line 84:

head -1 "$f" | grep -q '^---$' || echo "FAIL: $f — no YAML frontmatter"

Root cause

head -1 returns the first line verbatim, including a leading UTF-8 BOM (\xef\xbb\xbf) if present. grep '^---$' then fails because the line starts with the BOM bytes, not -. The file has perfectly valid frontmatter; the recipe just can't see it through the BOM.

Reproduction

# Create a BOM-prefixed file with valid frontmatter
printf '\xef\xbb\xbf---\ntitle: x\n---\n# body\n' > /tmp/bom.md

# Run the buggy recipe
head -1 /tmp/bom.md | grep -q '^---$' || echo "FAIL: /tmp/bom.md — no YAML frontmatter"
# → prints "FAIL: ..." (FALSE POSITIVE — the file is fine)

# Verify the BOM is what's confusing it
head -c 6 /tmp/bom.md | od -An -c
#   357 273 277   -   -   -        ← BOM + '---'

Observed impact (real-world)

In one WisteriaCortex-class vault (~3,142 notes, ~99% BOM-prefixed due to Windows + Obsidian Git workflow):

Metric Plugin recipe BOM-aware re-scan
Notes flagged as missing frontmatter 30 1
Compliance reported 99.0% 99.97%

29 of the 30 "missing-frontmatter" findings were BOM false positives. The one real issue was an evidence file intentionally kept without frontmatter — that genuine signal was effectively buried in the noise.

This was observed on arscontexta v0.8.0 (cache: ~/.claude/plugins/cache/agenticnotetaking/arscontexta/0.8.0/skills/health/SKILL.md; marketplace: identical md5).

Suggested fix

Replace line 84 with a BOM-tolerant variant. Two options:

Option A — minimal bash patch (drop-in)

head -c 6 \"\$f\" | sed 's/^\xef\xbb\xbf//' | grep -q '^---' \
  || echo \"FAIL: \$f — no YAML frontmatter\"
  • Reads only the first 6 bytes (faster than head -1 on large files)
  • Strips BOM if present via sed
  • Matches ^--- without anchoring $ so trailing newline isn't required

Option B — Python (more robust, matches the pattern already used by other skills)

python3 -c \"
import sys
with open(sys.argv[1], 'rb') as f:
    raw = f.read(6)
if raw.startswith(b'\xef\xbb\xbf'):
    raw = raw[3:]
sys.exit(0 if raw.startswith(b'---') else 1)
\" \"\$f\" || echo \"FAIL: \$f — no YAML frontmatter\"

Option A is the lighter touch; Option B is more defensive and would also catch other Unicode BOM variants (UTF-16 LE/BE) if those ever appear in a vault.

Related observations

This is one of three near-identical BOM-handling defects we hit in vault tooling in a single week. The other two were vault-local (queue_sync_check.py, _status.py) and are tracked locally; this is the first we've seen in an upstream plugin script. Suggested defensive default for the project:

All readers that inspect the first bytes of a Markdown file should strip BOM before pattern-matching. Either use utf-8-sig (Python) or strip via sed 's/^\xef\xbb\xbf//' (bash).

If helpful, I can prepare a PR with Option A. Let me know if you'd prefer a different shape (e.g., a single Python helper invoked from multiple skills).

Environment

  • arscontexta: v0.8.0
  • Vault size: ~3,142 notes
  • OS: Windows 11 + Git Bash / PowerShell
  • Editor: Obsidian + Obsidian Git plugin (auto-commit cycle is one source of BOM proliferation in this stack)

🤖 Reported with help from Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions