Add policy guardrails for statistical evaluation artifacts

## Context

`agent-kernel` provides capability-based authorization, policy enforcement, context firewalling and audit for agent tool ecosystems.

A useful cross-repo scenario is an agent consuming a statistical/model-evaluation artifact, such as an offline policy evaluation report from `skdr-eval`. These artifacts can be misused if an agent treats a headline value estimate as deployment evidence while ignoring support diagnostics, uncertainty or warnings.

## Goal

Add a policy pattern for gating agent actions based on structured evaluation artifacts.

Example principle:

> An agent may summarize a high-risk evaluation artifact, but it must not recommend deployment or automatic rollout when support diagnostics are `high_risk`.

## Suggested capabilities / policy checks

Support a generic artifact policy layer that can inspect fields such as:

- `artifact_type`
- `support_health`
- `warnings`
- `uncertainty`
- `decision_stable`
- `recommendation.intent`
- `limitations`

Potential decisions:

- `allow_summary`
- `allow_manual_review_recommendation`
- `require_human_review`
- `deny_deployment_recommendation`
- `deny_automatic_rollout`

## Example scenario

An agent receives an `EvaluationArtifact` with:

- candidate appears better than baseline;
- `support_health = high_risk`;
- warnings include low ESS or poor overlap.

Expected behavior:

- allowed: summarize the result and explain the caveats;
- allowed: recommend improving logs/support;
- denied: recommend deployment, rollout, or automatic A/B promotion as if the result were reliable.

## Acceptance criteria

- [ ] Add a documented policy example for evaluation artifacts.
- [ ] Add tests for at least `ok`, `caution`, and `high_risk` support states.
- [ ] The policy is generic enough to support non-`skdr-eval` producers.
- [ ] Audit traces record why an action was denied or downgraded.
- [ ] Docs explain the distinction between summarizing evidence and acting on evidence.
- [ ] Align with `weaver-spec` `EvaluationArtifact` contract if/when available.

## Non-goals

- Do not implement OPE/statistical estimation in `agent-kernel`.
- Do not hard-code a dependency on `skdr-eval`.
- Do not make policy decisions based only on a single numeric metric.

## AI agent notes

This is a policy-safety example. Keep it small, generic and testable. Prefer fixture artifacts rather than external package dependencies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add policy guardrails for statistical evaluation artifacts #96

Context

Goal

Suggested capabilities / policy checks

Example scenario

Acceptance criteria

Non-goals

AI agent notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add policy guardrails for statistical evaluation artifacts #96

Description

Context

Goal

Suggested capabilities / policy checks

Example scenario

Acceptance criteria

Non-goals

AI agent notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions