Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,11 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
- **Detached-trace invocation span** (proposal 0061, observability §4.4, spec v0.61.0). The OTel observer now synthesizes an `openarmature.invocation` span at the root of each detached trace (a detached subgraph and each detached fan-out instance), carrying the parent's shared `invocation_id` (detached mode is observer-side trace rendering, not a new run) and the detached unit's own `entry_node`; the detached subgraph / instance span nests under it. A raising detached subgraph surfaces ERROR plus the error category and an OTel exception event on both the parent dispatch span and the detached invocation span. This is observer-side only, with no graph-engine change; the Langfuse observer is unchanged (its Trace entity already plays the invocation-level-container role). Conformance fixtures 008 (rewritten) and 058 (newly wired) run in `test_observability`.
- **Per-attempt LLM spans under call-level retry** (proposal 0050, observability §5.5 / llm-provider §7.1). Completes proposal 0050, which shipped `partial` in v0.14.0 (failure-isolation middleware and the `complete(retry=...)` loop landed then; the per-attempt span surface was deferred). Under call-level retry the OTel observer now emits one `openarmature.llm.complete` span per attempt, each carrying `openarmature.llm.attempt_index` (0-based, 0..N-1, and 0 for a no-retry call). An intermediate failed attempt's span carries ERROR status plus its error category and the request-side attributes; the final attempt's span carries the terminal outcome and, on success, the full response surface. A python-internal `LlmRetryAttemptEvent`, dispatched once per attempt, is the sole source of the OTel span; the terminal `LlmCompletionEvent` / `LlmFailedEvent` stay one per call (payload, latency, Langfuse Generation) and no longer drive the OTel span. Langfuse renders one terminal Generation per call, with the per-attempt detail on the OTel span surface only (a spec-side §8 clarification to pin this is tracked, non-blocking). `conformance.toml` flips proposal 0050 to `implemented`; the call-level fixtures 056-058 are driven through the provider plus OTel observer and the single-attempt observability fixture 057 is wired.
- **Langfuse `trace.userId` / `trace.sessionId` population** (proposal 0064, observability §8.4.1, spec v0.62.0). The Langfuse observer now promotes a recognized `userId` key in the caller-supplied invocation metadata to Langfuse's first-class `trace.userId` field (the Users dashboard), additively: the key also remains at `trace.metadata.userId`. Promotion is automatic and unconditional; an absent key leaves `trace.userId` unset. The `LangfuseClient.trace()` surface (the Protocol, the in-memory client, and the SDK adapter) gains `session_id` / `user_id`. `trace.sessionId` is sourced from `openarmature.session_id`, which the sessions capability (proposal 0020) establishes; that capability is not yet implemented in python, so the `sessionId` plumbing is in place but dormant (no source) and unset in the interim. `conformance.toml` records proposal 0064 `partial` on that basis: fixture 084 cases 2/3/4 (not session-bound, `userId` present additively, `userId` absent) run, and the session-bound cases 1/5 defer until 0020. Langfuse-only: the OTel side already carries `openarmature.session_id` and `openarmature.user.*` as span attributes, and OTel has no trace-level session/user field.
- **Per-fetch prompt cache control: `cache_ttl_seconds`** (proposal 0072, prompt-management §5 / §6, spec v0.63.0). `PromptBackend.fetch`, `PromptManager.fetch`, and `PromptManager.get` gain an optional `cache_ttl_seconds` read-side control: `None` preserves current behavior, `0` forces a fresh read past any client-side cache, and `N > 0` bounds a served entry's staleness to N seconds; a negative value is rejected at the manager. It governs only which cached entry may be served, not whether or how results are cached. The bundled filesystem backend is cacheless and ignores it; the bundled Langfuse backend forwards it to the Langfuse SDK's `get_prompt` cache. Conformance fixtures 033/034 run through a caching harness backend (conformance-adapter §6.8: `source_read_count` plus a controllable `advance_clock`).

### Changed

- **Pinned spec advances v0.60.0 → v0.62.0** across the v0.15.0 cycle: v0.61.0 (proposal 0061, the detached-trace invocation span above) and v0.62.0 (proposal 0064, the Langfuse session/user population above). `conformance.toml` records 0061 `implemented` and 0064 `partial` (its `sessionId` half is dormant pending the sessions capability). Proposal 0050 needed no pin bump of its own (it was already within the pin from its v0.42.0 acceptance); its v0.14.0 `partial` entry flips to `implemented` with the per-attempt span surface above.
- **Pinned spec advances v0.60.0 → v0.63.1** across the v0.15.0 cycle: v0.61.0 (proposal 0061, the detached-trace invocation span above), v0.62.0 (proposal 0064, the Langfuse session/user population above), v0.63.0 (proposal 0072, the prompt cache control above), and the v0.63.1 patch (pipeline-utilities coverage fixtures 070/071 for the already-implemented 0069 / 0070 behavior, no new proposal). `conformance.toml` records 0061 / 0072 `implemented` and 0064 `partial` (its `sessionId` half is dormant pending the sessions capability). Proposal 0050 needed no pin bump of its own (it was already within the pin from its v0.42.0 acceptance); its v0.14.0 `partial` entry flips to `implemented` with the per-attempt span surface above.

## [0.14.0] — 2026-06-17

Expand Down
7 changes: 7 additions & 0 deletions conformance.toml
Original file line number Diff line number Diff line change
Expand Up @@ -705,3 +705,10 @@ note = "The OTel observer synthesizes an openarmature.invocation span at the roo
status = "partial"
since = "0.15.0"
note = "The Langfuse observer promotes a recognized userId caller-metadata key to the first-class trace.userId (additive: the key also stays in trace.metadata.userId), and sets trace.sessionId from openarmature.session_id when present. trace.userId is LIVE (sourced from 0034 caller metadata): fixture 084 cases 2/3/4 (not-session-bound, userId present additive, userId absent) pass. partial because trace.sessionId is DORMANT -- openarmature.session_id is established by the sessions capability (0020, observability §5.6), unimplemented in python until v0.19.0, so there is no session_id source yet; the trace(session_id=) plumbing is wired end to end but the observer passes None. Fixture 084 session-bound cases 1 + 5 are deferred (per-case) pending 0020. Langfuse-only: no OTel change (the OTel side already carries openarmature.session_id + openarmature.user.* as span attributes; no trace-level OTel equivalent)."

# Spec v0.63.0 (proposal 0072). Per-fetch cache_ttl_seconds read-side
# control (prompt-management §5 / §6 + conformance-adapter §6.8).
[proposals."0072"]
status = "implemented"
since = "0.15.0"
note = "PromptBackend.fetch / PromptManager.fetch / get gain an optional cache_ttl_seconds read-side control (absent / None = current behavior; 0 = force a fresh read past any cache; N > 0 = bound a served entry's staleness to N seconds; negative is rejected). It governs only which cached entry MAY be served for this fetch, not whether / how the result is cached. python's bundled backends (filesystem, in-memory) are cacheless and treat it as a no-op; the manager threads it through the §9 fallback chain and rejects negatives. render is unchanged. The TTL semantics are exercised by a caching prompt-backend conformance-harness primitive (§6.8: caches by (name, label), source_read_count, advance_clock controllable clock); fixtures 033/034 pass. No production caching backend ships (per §5, cacheless backends no-op). The v0.63.1 pin also wires pipeline-utilities coverage fixtures 070/071 (already-implemented 0069/0070 behavior; no new proposal)."
26 changes: 26 additions & 0 deletions docs/concepts/prompts.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,32 @@ Why two operations instead of one? Three reasons:
The convenience `get()` operation gives you the single-call
shape when you want it without removing the separability.

## Refreshing cached prompts: `cache_ttl_seconds`

`fetch` and `get` take an optional `cache_ttl_seconds` that controls how
fresh a served prompt must be, for backends that maintain a client-side
cache:

- omitted / `None` keeps the backend's current behavior;
- `0` forces a fresh read past any cache;
- `N > 0` serves a cached entry only while it is younger than N seconds,
re-reading the source once it ages past N.

A negative value is rejected. It is a read-side control: it governs which
cached entry may be served for this fetch, not whether or how results are
cached. Cacheless backends (the bundled filesystem backend) ignore it; the
bundled Langfuse backend forwards it to the Langfuse SDK's own prompt cache.

```python
# Always re-read from the backend, bypassing any cache:
fresh = await manager.fetch("greeting", "production", cache_ttl_seconds=0)

# Serve a cached entry only if it's under five minutes old:
recent = await manager.get(
"greeting", "production", {"user": "Alice"}, cache_ttl_seconds=300
)
```

## Prompt identity

Every `Prompt` carries five identity fields:
Expand Down
4 changes: 3 additions & 1 deletion examples/chat-with-multimodal/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -274,7 +274,9 @@ class _NoFetchBackend:
``fetch()`` is never invoked.
"""

async def fetch(self, name: str, label: str = "production") -> Prompt:
async def fetch(
self, name: str, label: str = "production", *, cache_ttl_seconds: int | None = None
) -> Prompt:
raise NotImplementedError("example constructs prompts inline; fetch not used")


Expand Down
4 changes: 3 additions & 1 deletion examples/langfuse-observability/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,9 @@ def __init__(self) -> None:
},
)

async def fetch(self, name: str, label: str = "production") -> Prompt:
async def fetch(
self, name: str, label: str = "production", *, cache_ttl_seconds: int | None = None
) -> Prompt:
if name != "mission-briefing":
from openarmature.prompts import PromptNotFound

Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ Specification = "https://github.com/LunarCommand/openarmature-spec"
openarmature = "openarmature.cli:main"

[tool.openarmature]
spec_version = "0.62.0"
spec_version = "0.63.1"

[dependency-groups]
dev = [
Expand Down
4 changes: 2 additions & 2 deletions src/openarmature/AGENTS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# OpenArmature — Agent documentation

*This is the agent guide bundled with the openarmature Python package, version 0.14.0 (spec v0.62.0). For the full docs site see [openarmature.ai](https://openarmature.ai). For the canonical spec text see [openarmature.org/capabilities](https://openarmature.org/capabilities/). For project-specific conventions for the code you're editing, see the host project's `AGENTS.md` or `CLAUDE.md`.*
*This is the agent guide bundled with the openarmature Python package, version 0.14.0 (spec v0.63.1). For the full docs site see [openarmature.ai](https://openarmature.ai). For the canonical spec text see [openarmature.org/capabilities](https://openarmature.org/capabilities/). For project-specific conventions for the code you're editing, see the host project's `AGENTS.md` or `CLAUDE.md`.*

## TL;DR

Expand All @@ -10,7 +10,7 @@ OpenArmature is a workflow framework for LLM pipelines and tool-calling agents:

## Capability contracts

_Sourced from openarmature-spec v0.62.0. Each entry below reproduces §1 (Purpose) and §2 (Concepts) of the capability's `spec.md` verbatim — including additions from accepted proposals that this Python implementation may not yet ship. For per-proposal implementation status (implemented / partial / textual-only / not-yet), see the `conformance.toml` manifest at the repo root. For the full spec text (execution model, error semantics, determinism, observer hooks, etc.) see the linked docs site._
_Sourced from openarmature-spec v0.63.1. Each entry below reproduces §1 (Purpose) and §2 (Concepts) of the capability's `spec.md` verbatim — including additions from accepted proposals that this Python implementation may not yet ship. For per-proposal implementation status (implemented / partial / textual-only / not-yet), see the `conformance.toml` manifest at the repo root. For the full spec text (execution model, error semantics, determinism, observer hooks, etc.) see the linked docs site._

### Capability: `graph-engine`

Expand Down
2 changes: 1 addition & 1 deletion src/openarmature/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
"""

__version__ = "0.14.0"
__spec_version__ = "0.62.0"
__spec_version__ = "0.63.1"
# Proposal 0052 (spec observability §5.1 / §8.4.1): canonical
# package-registry name for this implementation. Surfaces on every
# OTel invocation span as ``openarmature.implementation.name`` and on
Expand Down
10 changes: 9 additions & 1 deletion src/openarmature/prompts/backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,21 @@ class PromptBackend(Protocol):
original fetch time, not the cache hit time.
"""

async def fetch(self, name: str, label: str = "production") -> Prompt:
async def fetch(
self, name: str, label: str = "production", *, cache_ttl_seconds: int | None = None
) -> Prompt:
"""Return the prompt registered as ``(name, label)``.

``label`` defaults to ``"production"``. Raises
``PromptNotFound`` if no prompt matches, and
``PromptStoreUnavailable`` if the backing store is unreachable.
The returned ``Prompt`` carries its raw template plus
metadata; rendering is the manager's job, not the backend's.

``cache_ttl_seconds`` is a read-side cache control: ``None``
preserves the backend's current behavior, ``0`` forces a fresh
read past any client-side cache, and ``N > 0`` bounds a served
cached entry's staleness to N seconds. Cacheless backends ignore
it; caching backends honor it.
"""
...
7 changes: 6 additions & 1 deletion src/openarmature/prompts/backends/filesystem.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,13 +143,18 @@ def _resolve_sampling(self, name: str, label: str) -> SamplingConfig | None:
)
return _sampling_from_dict(cast(dict[str, Any], raw))

async def fetch(self, name: str, label: str = "production") -> Prompt:
async def fetch(
self, name: str, label: str = "production", *, cache_ttl_seconds: int | None = None
) -> Prompt:
"""Read the prompt template and (optionally) its sidecar sampling config.

Returns a ``Prompt`` whose ``version`` is the leading 16 hex
chars of the template's SHA-256 and ``template_hash`` is the
full digest. Raises ``PromptNotFound`` when the template is
missing and ``PromptStoreUnavailable`` on other I/O errors.

The filesystem backend is cacheless, so ``cache_ttl_seconds`` is
accepted for protocol conformance and ignored.
"""
path = self._template_path(name, label)
try:
Expand Down
20 changes: 14 additions & 6 deletions src/openarmature/prompts/backends/langfuse.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,9 @@ class LangfusePromptClient(Protocol):
tests can supply a lightweight fake.
"""

def get_prompt(self, name: str, *, label: str = "production") -> TextPromptClient | ChatPromptClient: ...
def get_prompt(
self, name: str, *, label: str = "production", cache_ttl_seconds: int | None = None
) -> TextPromptClient | ChatPromptClient: ...
Comment thread
chris-colinsky marked this conversation as resolved.


# Langfuse prompt `config` keys that line up with SamplingConfig's
Expand Down Expand Up @@ -89,10 +91,14 @@ class LangfusePromptBackend:
def __init__(self, client: LangfusePromptClient) -> None:
self._client = client

async def fetch(self, name: str, label: str = "production") -> Prompt:
async def fetch(
self, name: str, label: str = "production", *, cache_ttl_seconds: int | None = None
) -> Prompt:
# The Langfuse SDK's get_prompt is synchronous (and does its own
# client-side caching); run it off the event loop.
result = await asyncio.to_thread(self._get_prompt, name, label)
# client-side caching); run it off the event loop. The proposal
# 0072 cache_ttl_seconds control forwards to that SDK cache:
# None = SDK default, 0 = no cache (fresh), N = N-second bound.
result = await asyncio.to_thread(self._get_prompt, name, label, cache_ttl_seconds)

if isinstance(result, ChatPromptClient):
normalized = _normalized_langfuse_entries(result.prompt, name=name, label=label)
Expand Down Expand Up @@ -134,9 +140,11 @@ async def fetch(self, name: str, label: str = "production") -> Prompt:
metadata=_metadata_from(result),
)

def _get_prompt(self, name: str, label: str) -> TextPromptClient | ChatPromptClient:
def _get_prompt(
self, name: str, label: str, cache_ttl_seconds: int | None = None
) -> TextPromptClient | ChatPromptClient:
try:
return self._client.get_prompt(name, label=label)
return self._client.get_prompt(name, label=label, cache_ttl_seconds=cache_ttl_seconds)
except NotFoundError as exc:
raise PromptNotFound(
f"prompt ({name!r}, {label!r}) not found in Langfuse",
Expand Down
22 changes: 19 additions & 3 deletions src/openarmature/prompts/manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,9 @@ def _resolve_label(self, label: str | None, name: str) -> str:
return self._label_resolver.resolve(name)
return SPEC_FALLBACK_LABEL

async def fetch(self, name: str, label: str | None = None) -> Prompt:
async def fetch(
self, name: str, label: str | None = None, *, cache_ttl_seconds: int | None = None
) -> Prompt:
"""Consult composed backends in order, applying the fallback chain.

Label is resolved by a three-step chain: explicit argument >
Expand All @@ -123,12 +125,23 @@ async def fetch(self, name: str, label: str | None = None) -> Prompt:
- ``PromptStoreUnavailable`` from a backend continues to the
next. After ALL backends are exhausted with unavailable
failures, the manager raises ``PromptStoreUnavailable``.

``cache_ttl_seconds`` is a read-side cache control forwarded to
each backend's ``fetch``: ``None`` keeps
current behavior, ``0`` forces a fresh read, ``N > 0`` bounds a
served entry's staleness to N seconds; a negative value is
rejected. Cacheless backends ignore it.
"""
if cache_ttl_seconds is not None and cache_ttl_seconds < 0:
raise ValueError(
f"cache_ttl_seconds must be >= 0 (got {cache_ttl_seconds!r}); "
"None preserves current behavior, 0 forces a fresh read"
)
resolved_label = self._resolve_label(label, name)
causes: list[BaseException] = []
for backend in self._backends:
try:
return await backend.fetch(name, resolved_label)
return await backend.fetch(name, resolved_label, cache_ttl_seconds=cache_ttl_seconds)
except PromptNotFound:
raise
except PromptStoreUnavailable as exc:
Expand Down Expand Up @@ -520,13 +533,16 @@ async def get(
variables: Mapping[str, Any] | None = None,
*,
placeholders: Mapping[str, Sequence[Message]] | None = None,
cache_ttl_seconds: int | None = None,
) -> PromptResult:
"""Convenience equivalent to ``render(await fetch(name, label), variables)``.

``label`` follows the same three-step resolution as :meth:`fetch`.
``placeholders`` is forwarded to :meth:`render`.
``cache_ttl_seconds`` is forwarded to :meth:`fetch` (the read-side
cache control).
"""
prompt = await self.fetch(name, label)
prompt = await self.fetch(name, label, cache_ttl_seconds=cache_ttl_seconds)
return self.render(prompt, variables, placeholders=placeholders)


Expand Down
Loading