Skip to content

Skills, File System, Console capaiblities via pydantic-ai vs custom artifact store. #88

@NISH1001

Description

@NISH1001

pydantic-ai is the direction we will take, so we want to minimize custom implementation.

Claude helped me work through the design comparing our ArtifactStore (custom implementation) vs pydantic-ai capabilities. Verdict + plan below.

Scope: read-only for skills/artifacts (knowledge inputs), read+write for per-session workspaces (ephemeral local — papers, plots, generated files). Agent-driven writes back to GitHub remain out of scope.


ArtifactStore vs pydantic-ai capabilities — finalized verdict

Scope

In akd-ext, agents need to:

  1. Read skills/artifacts (knowledge inputs) — bundled with the package and/or hosted on a public GitHub repo. Read-only.
  2. Read + write a per-session workspace — ephemeral local directory where the agent produces outputs (papers, plots, generated files) the user can inspect at end of chat.

Agent-driven writes back to GitHub (publishing produced artifacts back to a repo) are explicitly out of scope for this plan.

(akd-ext nomenclature: what pydantic-ai-skills calls a "skill" we call an "artifact"; same shape, just naming. We use the terms interchangeably below; on disk both are SKILL.md files with YAML frontmatter.)

Verdict

ArtifactStore is fully redundant. Adopt pydantic-ai-skills for the read side and pydantic-ai-backend for the session-workspace read/write side. Both responsibilities map cleanly onto pydantic-ai capabilities; neither requires custom infrastructure. The work-in-progress on feature/github-store (GitHubArtifactStore) is paused — neither merged nor deleted — because GitHub access in akd-ext is read-only via GitSkillsRegistry, and writes to GitHub are out of scope. Immediate priority: migrate closed_loop_cm1 agents off their eager system-prompt context injection (12k+ lines of cm1_readme.md per call) to SkillsCapability's progressive disclosure, for the token win.

Three needs → two pydantic-ai capabilities

Agent need pydantic-ai answer Custom code
Read skills/artifacts from local dirs (e.g. closed_loop_cm1/context/*.md bundled in the package) SkillsCapability(directories=[...]) from pydantic-ai-skills None
Read skills/artifacts from a public GitHub repo SkillsCapability(registries=[GitSkillsRegistry(repo_url=...)]) — clones, caches, supports auth, version pinning None
Read + write a per-session workspace (papers, plots, generated files) ConsoleCapability(backend=LocalBackend(root_dir=session_path), permissions=DEFAULT_RULESET) from pydantic-ai-backend None (per-session dir lifecycle is already handled by pydantic-ai-backend's SessionManager)

Both SkillsCapability configurations can coexist on a single agent. The ConsoleCapability composes alongside them — they're orthogonal: skills are loaded knowledge (immutable inputs), session workspace is generated output (ephemeral, local-only). The agent sees one unified set of tools.

SkillsCapability ships read-only progressive disclosure: at construction it injects skill name + description into the system prompt (cheap), then exposes list_skills / load_skill(name) / read_skill_resource(...) / run_skill_script(...) tools so the model fetches full content only when relevant.

ConsoleCapability ships ls / read / write / edit / glob / grep filtered by permission rulesets (READONLY / DEFAULT / STRICT / PERMISSIVE). With LocalBackend(root_dir=session_path), the agent is sandboxed inside the session workspace — it cannot read or write outside of it.

Why ArtifactStore is fully redundant — two facts

Fact 1: A real, present skills use case is burning tokens today

akd_ext/agents/closed_loop_cm1/capability_feasibility_mapper.py:369-374 (on feature/pydantic_ai_base_agent):

cluster_it_context: str = Field(
    default_factory=lambda: (Path(__file__).parent / "context" / "cluster_it.md").read_text(),
)
cm1_readme_context: str = Field(
    default_factory=lambda: (Path(__file__).parent / "context" / "cm1_readme.md").read_text(),
)

_create_agent (line 436-442) concatenates both into agent.instructions on every construction. cm1_readme.md is 12,115 lines — currently injected into the system prompt on every model call. This is precisely the failure mode SkillsCapability's progressive disclosure was designed to fix.

Fact 2: ArtifactStore has zero call sites

Cross-branch grep for ArtifactStore, GitHubArtifact, LocalArtifact: hits only inside akd_ext/artifacts/ itself. No agent imports it. No tool consumes it. No system prompt builder uses __str__(). It's purely speculative infrastructure.

Dependency assessment (verified, not skimmed)

pydantic-ai-skills (Doug Trajano) — safe to depend on

  • 261 ⭐, 21 forks, v0.8.0 released April 21 2026
  • 17 issues total, 0 open — every reported issue closed
  • 20 PRs merged including headline features: SkillsCapability, GitSkillsRegistry, auto_reload, generic SkillsToolset[Any]
  • Recent commits: responsive iteration on real user reports
  • Single maintainer — real risk, but the data format (markdown + YAML frontmatter) is fully portable; rebuild cost if abandoned is ~1 day for a homegrown ~150 LOC AbstractCapability subclass

pydantic-ai-backend (Vstorm) — safe to depend on

  • 83 ⭐, 19 forks, v0.2.5 released April 20 2026
  • 3 issues total, 0 open
  • Multi-contributor (Kacper Włodarczyk, DEENUU1, ilayu, community PRs) — better bus factor than skills
  • Recent activity: docker+daytona session manager, async protocol, sandbox sessions
  • Used here only for local-filesystem ConsoleCapability (per-session workspace). No GitHub backend, no external network.

Both are alpha-stage but actively maintained. No abandonment signals.

Action sequence

Prerequisite: Merge feature/pydantic_ai_base_agent

Everything below builds on the pydantic-ai foundation. Until that branch lands on develop (or whatever your integration branch is), the capability work has no place to live. Confirm mergeability and ship it first.

Step 1 — Adopt SkillsCapability for closed_loop_cm1, local-only (priority: token win, ~1 day)

On a new branch off the now-merged pydantic-ai base:

  • pyproject.toml: add pydantic-ai-skills>=0.8.0
  • Convert akd_ext/agents/closed_loop_cm1/context/cluster_it.md and cm1_readme.md into SKILL.md format with YAML frontmatter:
    ---
    name: cluster-it-infrastructure
    description: NCAR/Frontera cluster compute resources, scheduling, storage layout for HPC feasibility analysis.
    ---
    
    <existing markdown body>
    
    Each lives at e.g. akd_ext/agents/closed_loop_cm1/skills/cluster-it-infrastructure/SKILL.md.
  • akd_ext/agents/closed_loop_cm1/capability_feasibility_mapper.py:
    • Delete cluster_it_context and cm1_readme_context config fields (lines 369-376)
    • Delete the extra += concatenation in _create_agent (lines 436-442)
    • Add SkillsCapability(directories=[Path(__file__).parent / "skills"]) to the agent's capability list. Read _base/pydantic_ai/_capabilities.py first to find the composition site.
  • Apply the same pattern to peers in closed_loop_cm1/: experiment_implementation.py, workflow_spec_builder.py, research_report_generator.py, interpretation_paper_assembly.py.

Verification:

  • uv run pytest tests/agents/test_capability_feasibility_mapper.py -v (existing tests; may need updates if they assert on the deleted config fields)
  • Run a representative query against the agent. Confirm via debug logging that cm1_readme.md is not in the initial system prompt, is available via list_skills, and that load_skill("cm1-readme") returns the content when called
  • Compare token counts before/after on a representative input — the win should be visible

Step 2 — Session-workspace ConsoleCapability (when first agent produces file outputs)

When the first agent needs to write outputs the user wants to inspect (likely research_report_generator or interpretation_paper_assembly):

  • pyproject.toml: add pydantic-ai-backend>=0.2.5
  • Decide on session-workspace lifecycle: per-chat-session temp dir vs. per-user persistent dir vs. pydantic-ai-backend.SessionManager. Probably: a small wrapper around SessionManager that scopes a workspace dir to the chat session and exposes the path to the agent at run time.
  • Wire into agents that produce outputs:
    capabilities=[
        SkillsCapability(directories=[...]),   # from Step 1
        ConsoleCapability(
            backend=LocalBackend(root_dir=session_workspace_path),
            permissions=DEFAULT_RULESET,
        ),
    ]

Verification:

  • Run an agent that produces a file (e.g. a markdown report) in its workspace. Confirm the file lands at session_workspace_path and is accessible to the user post-run.
  • Confirm sandboxing: agent attempts to read/write outside root_dir should fail.

Step 3 — Add GitSkillsRegistry once a public skills/artifacts repo exists (read-only, ~half day)

When the canonical AKD skills/artifacts repo is set up on GitHub (e.g. NASA-IMPACT/akd-skills or similar), wire it in:

SkillsCapability(
    directories=[Path(__file__).parent / "skills"],   # local fallback
    registries=[GitSkillsRegistry(
        repo_url="https://github.com/NASA-IMPACT/akd-skills",
        path="skills",
        target_dir="./.cache/akd-skills",
        clone_options=GitCloneOptions(depth=1, single_branch=True),
    )],
)

Agents pull canonical artifacts from GitHub at startup and use them via the same progressive-disclosure surface as local skills. Read-only — no commit/push semantics.

Verification:

  • Tiny PydanticAIBaseAgent instantiation against the public repo. Confirm list_skills reports the GitHub-hosted artifacts, load_skill(...) returns content. Skip CI runs unless network is allowed.

Step 4 — Park feature/github-store

Don't merge. Don't delete. The PyGithub plumbing in akd_ext/artifacts/stores/github.py:14-210 (sha-aware writes, fast-path no-op detection, github_client= injection pattern) is good code — keep the branch as a parking lot. If a GitHub-write use case ever lands later (e.g. CARE agents publishing produced artifacts back to a repo), this is the implementation foundation to revive. Until then, leave it dormant.

Scope note

This plan covers read-only skills/artifact loading plus read-write session workspaces. Agent-driven writes to GitHub are intentionally out of scope here — defer that decision to a separate plan when a concrete write use case lands.

The mechanical work in Step 1 is well-scoped and ready to execute as a regular task once the pydantic-ai base is merged. Steps 2 and 3 unblock when their respective use cases / infrastructure are ready.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions