Skip to content

Guard against unaccounted observability conformance fixtures#180

Merged
chris-colinsky merged 3 commits into
mainfrom
chore/observability-fixture-coverage-guard
Jun 23, 2026
Merged

Guard against unaccounted observability conformance fixtures#180
chris-colinsky merged 3 commits into
mainfrom
chore/observability-fixture-coverage-guard

Conversation

@chris-colinsky

Copy link
Copy Markdown
Member

What

The observability conformance harness silently pytest.skip-ped any pinned fixture not in the positive _SUPPORTED_FIXTURES allowlist, so a future unwired spec fixture would skip rather than fail CI. This adds a fail-on-unknown guard and converts the silent skips into explicit, documented accounting.

Changes

  • New test_observability_fixture_coverage_is_complete guard — every pinned observability fixture must be either run (_SUPPORTED_FIXTURES) or explicitly accounted for. Also catches stale entries (documented fixture no longer on disk) and supported/not-run overlaps.
  • Three documented buckets replace the silent skip:
    • _DEFERRED_FIXTURES (future capability): the 10 embedding fixtures (074-083) + 089, gated on the embedding capability (proposal 0059, lands v0.16.0); plus the nested-lineage Langfuse case (039), whose stale "0045 not implemented" reason is corrected (0045 is implemented; 039 defers for a nested-case harness limitation).
    • _UNIT_TESTED_FIXTURES (32): implemented behavior covered by the dedicated unit suite rather than the YAML harness, each cited to its covering file.
    • _CONVENTION_ONLY_FIXTURES (3): the proposal 0048 section 9 queryable-observer pattern, convention-only and doc-satisfied (no library surface).
  • Close two genuine gaps the accounting surfaced (fixtures 064/066): the active_prompt / active_prompt_group event-population path was implemented in the provider but only the observer's span rendering of an injected field was tested. Two new provider tests drive complete() inside with_active_prompt / with_active_prompt_group and assert the emitted event carries the record.

Coverage after this change

Of 98 pinned observability fixtures: 51 run in the harness, 32 covered by unit tests, 11 deferred on the embedding capability (v0.16.0), 3 convention-only, 1 fixture-wiring-pending. Behavior-wise, only the 11 embedding fixtures await unimplemented capability; everything else is implemented and tested.

Came out of the v0.15.0 release review (spec finding: the allowlist silent-skip).

The OpenAI provider populates LlmCompletionEvent.active_prompt /
active_prompt_group from the with_active_prompt context, but no test
drove complete() inside a real prompt context to assert the event
carries the record -- only the observer's span rendering of an injected
field was covered. Add two provider tests closing that gap (conformance
fixtures 064 / 066).
The conformance harness silently pytest.skip-ped any observability
fixture not in _SUPPORTED_FIXTURES, so a future unwired spec fixture
would not fail CI. Add test_observability_fixture_coverage_is_complete:
every pinned fixture must be run or explicitly accounted for; the guard
also catches stale entries and supported/not-run overlaps.

Restructure the silent skips into three documented buckets:
_DEFERRED_FIXTURES (future capability -- the embedding fixtures gated on
proposal 0059, plus the nested-lineage case, whose stale "0045 not
implemented" reason is corrected), _UNIT_TESTED_FIXTURES (32, each cited
to its covering unit-suite file), and _CONVENTION_ONLY_FIXTURES (the
0048 section 9 queryable-observer pattern, doc-satisfied).
Copilot AI review requested due to automatic review settings June 23, 2026 03:32

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a completeness guard to the observability conformance harness so newly pinned spec fixtures can’t silently skip CI, and fills two LLM provider event-coverage gaps around prompt context propagation.

Changes:

  • Add an explicit accounting/coverage guard for pinned observability fixtures and replace silent “unknown fixture” skips with documented buckets.
  • Document deferred, unit-tested, and convention-only observability fixtures (with reasons) and enforce no stale/overlapping entries.
  • Add unit tests ensuring OpenAIProvider.complete() populates LlmCompletionEvent.active_prompt / active_prompt_group from prompt context.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
tests/unit/test_llm_provider.py Adds provider unit tests for active_prompt / active_prompt_group on typed completion events.
tests/conformance/test_observability.py Adds fixture coverage guard and explicit “not run here” accounting buckets to prevent silent skips.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/unit/test_llm_provider.py Outdated
Comment thread tests/unit/test_llm_provider.py Outdated
PR #180 review: the active_prompt / active_prompt_group tests asserted
object identity (is), coupling them to the provider passing the exact
instance through. The contract is the populated record, so a reasonable
snapshot/copy refactor would break them. Use pydantic structural
equality (==), which is copy-robust and still checks the full record.
@chris-colinsky chris-colinsky merged commit d29c6b9 into main Jun 23, 2026
5 checks passed
@chris-colinsky chris-colinsky deleted the chore/observability-fixture-coverage-guard branch June 23, 2026 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants