Skip to content

feat: Phase 1+2 — 6 umbrella skills, curated anchors, evals, gap fixes#1

Merged
george-bafaloukas-forgerock merged 28 commits into
mainfrom
pr/skills-refactoring
Jun 3, 2026
Merged

feat: Phase 1+2 — 6 umbrella skills, curated anchors, evals, gap fixes#1
george-bafaloukas-forgerock merged 28 commits into
mainfrom
pr/skills-refactoring

Conversation

@george-bafaloukas-forgerock

Copy link
Copy Markdown
Collaborator

Summary

  • 6 umbrella skills built and validated: ping-quickstart, ping-foundation, ping-orchestration, ping-universal-services, ping-app-integration, ping-identity-for-ai
  • 3-tier progressive disclosure architecture: metadata → SKILL.md (≤120 lines) → references/curated/references/generated/ → Docs MCP
  • 58 curated anchors across all 6 skills and 4 platform branches (PingOne MT, PingOne ST, Ping Software, cross-platform)
  • 14 generated reference manifests built by scripts/build_reference_manifests.py; CI workflow in .github/workflows/build-reference-manifests.yml
  • Strategy doc § 9 use-case evaluation — all 9 Getting Started 0–30 prompts evaluated; 4 gaps closed:
    • passkeys-and-passwordless.md — friction tiers, FIDO2/WebAuthn patterns across all 3 platforms
    • pingone-mt/themes-and-branding.md — branding, custom domain, email/SMS templates, DaVinci UI Studio
    • end-to-end-validation.md — cross-cutting validation matrix, test-user patterns, reporting format
    • server-side-integration-basics.md — backend OIDC, M2M, token exchange, CIBA, retry/429 resilience
  • Layer 1 eval (live, Bedrock claude-sonnet-4-6): all 6 skills PASS (≥90% trigger / 100% non-trigger / 100% ambiguous)
  • Eval prompt sets: 9 strategy use cases added across 5 skill YAMLs (T-50–T-55); all validated against schema

Test plan

  • scripts/build_reference_manifests.py --root . runs clean; 14 manifests regenerated
  • python3 -m evals.harness.validate_prompts — all prompt YAML files OK
  • python3 -m evals.harness.run_eval --adapter mock --layer 1 — 6/6 PASS
  • python3 -m evals.harness.run_eval --adapter mock --layer 2 — 6/6 PASS
  • python3 -m evals.harness.run_eval --adapter claude --layer 1 — 6/6 PASS (live Bedrock)
  • All SKILL.md files ≤120 lines
  • All new curated anchors have complete frontmatter

🤖 Generated with Claude Code

brando-dill and others added 28 commits May 14, 2026 11:44
Add commands/, rules/, evals/ top-level dirs and the multi-IDE
manifests required by strategy doc § 5: .claude-plugin/plugin.json,
.cursor-plugin/marketplace.json + plugin.json, plugins/ping-identity/
.claude-plugin/plugin.json, and a .well-known/agent-skills/index.json
stub for the discovery RFC.

Refs: PLAN.md Phase 0 step 1; strategy doc § 5
Move ping-quickstart's flat references into references/curated/, add
the runtime/ tier with a docs-mcp-routing.md stub to all three live
skills (ping-quickstart, ping-foundation, ping-orchestration), and
update ping-quickstart's SKILL.md paths to match.

Refs: PLAN.md Phase 0 step 3; strategy doc § 6
Add docs-mcp-routing.md stub to all three live skills' runtime/ tier.
Update ping-quickstart SKILL.md reference paths to curated/ subdirectory.

Refs: PLAN.md Phase 0 step 3; strategy doc § 6
…entity-for-ai

Three Phase 0 scaffolds (≤120 lines each, agentskills.io compliant)
matching strategy doc § 4 'In Practice'. Each ships SKILL.md, a
ping-marketplace.json metadata file, and the canonical
references/{curated,generated,runtime}/ tier set per strategy § 6.
Bodies are stubs marked status=Phase 0 scaffold; Phase 1 fills in
routing tables and curated anchors per strategy § 7.

Refs: PLAN.md Phase 0 step 4; strategy doc § 4 and § 7
Canonical decision rule for sandbox (docs) vs production (helix) runtime
plus the strategy-doc § 0 tier-discipline rule (curated → generated →
docs MCP). Referenced from every skill's references/runtime/
docs-mcp-routing.md and scored by Layer 3 of the eval.

Refs: PLAN.md Phase 0 step 5; strategy doc § 0 and § 4
….mdc

Copy shared/templates/AUTHORING-RULES.md and shared/taxonomies/
routing-rules.md into rules/ so authors have a single source of truth
at the new repo root. Add rules/ping-identity.mdc (Cursor-style rule)
for parity with Cloudflare's rules/workers.mdc.

Refs: PLAN.md Phase 0 step 5; Cloudflare repo structure
Update plugin description and keywords to reflect the strategy-doc § 4
umbrella set.

Refs: PLAN.md Phase 0 step 1
Port shared/evals/routing-eval.md to evals/scorecards/ with a Layer 1
harness callout prepended. Add Layer 2 (anchor-selection-eval.md —
precision/recall against expected_anchors) and Layer 3
(plan-quality-eval.md — LLM-as-judge across correctness, completeness,
concreteness, tier discipline).

Refs: PLAN.md § Evaluation; strategy doc § 0
JSON Schema for evals/prompts/*.yaml + Python validator with 5 pytest
cases covering: minimal valid set, skill/filename match, required
top-level fields, required item fields, tier enum.

Run: python3 -m evals.harness.validate_prompts

Refs: PLAN.md § Evaluation Layer 1
Live skills (ping-quickstart, ping-foundation, ping-orchestration) ship
at v1 minimums (10/5/3) since their bodies and curated content already
exist. The three new skills ship Phase-0 minimums (3/2/1) and expand in
Phase 1 once curated anchors land per strategy doc § 7.

All prompt sets validate against evals/schemas/prompt-set.schema.json.

Refs: PLAN.md § Evaluation, Phase 0 step 4
base.py defines the RunResult dataclass and LLMAdapter Protocol every
driver must satisfy. mock.py provides a rule-based adapter so harness
unit tests can assert scoring without hitting a real LLM API.

Refs: PLAN.md § Evaluation Layer 4 (cross-LLM)
run_eval.py loads evals/prompts/*.yaml, drives an LLMAdapter (mock or
claude), and scores routing accuracy (Layer 1) or anchor selection
(Layer 2) against the bars in evals/scorecards/. Three pytest cases
cover trigger pass, trigger fail, and Layer 2 recall/precision.
All 11 harness tests pass. Mock end-to-end: PASS for all 6 skills.

Run:
  python3 -m evals.harness.run_eval --adapter mock --layer 1
  python3 -m evals.harness.run_eval --adapter mock --layer 2

Refs: PLAN.md § Evaluation Layers 1 and 2
Both runnable, both exit 0 with [stub] messages so Phase 1 can wire in
real LLM calls without churning file layout. ClaudeAdapter raises a
clear SystemExit until Phase 1 implements it.

Refs: PLAN.md § Evaluation Layers 3 and 4
Documents the 5-layer eval framework, how to run each layer locally,
and the per-PR authoring checklist that becomes a CI gate in Phase 3.

Phase 0 smoke check:
- validate_prompts: all 6 prompt YAMLs valid
- run_eval --layer 1 --adapter mock: PASS (6/6 skills)
- run_eval --layer 2 --adapter mock: PASS (6/6 skills)
- pytest evals/harness/tests/: 11 passed

Phase 0 exit criteria from PLAN.md: all green.

Refs: PLAN.md Phase 0 exit criterion
Author 12 curated anchors across ping-universal-services, ping-app-integration,
and ping-identity-for-ai (4 per skill). Expand all eval prompt YAMLs to spec
(≥10/5/3). Cross-link all new skills from ping-quickstart. Update plugin-map.md
and references/index.json.

Full repo audit (119 issues fixed across 43 files): frontmatter gaps, broken
index.json paths, routing placeholders, UI navigation language, missing Scope
sections, scaffold version markers, passive skill descriptions.

Implement ClaudeAdapter via Bedrock (eu.anthropic.claude-sonnet-4-6). Fix
composition.yaml schema handling in harness. All 6 skills pass Layer 1 eval
at ≥90/90/80% — results in evals/results/2026-06-01/claude.layer1.json.

Add PHASE-1-EXECUTION-PLAN.md, PLAN.md, and rewrite README with install
instructions, skills overview, eval status table, and delivery status.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…n roles, and doc-sourced updates

- ping-quickstart: 3 new anchors (licensing, ForgeRock migration, inherited deployment orientation)
  and fixes to 3 existing anchors (console URLs, AIC URL structure, session timeout facts)
- ping-foundation: 4 new MT anchors (app-registration, sign-on-policies, directory-and-populations,
  admin-roles-and-access), 1 new ping-software anchor (pingaccess-basics), fixes to all 12 existing
  anchors (strip /latest/, correct console URLs, update last_updated)
- admin-roles-and-access.md: full coverage of built-in/custom roles, 3 admin onboarding methods,
  Administrators environment best practice — sourced from PingOne MT getting-started doc
- directory-and-populations.md: added groups section (static/dynamic, internal/external, group roles)
- sign-on-policies.md: added registration policy and passwordless SMS variants
- app-registration.md: added group-based access and SSO URL location
- index.json: added all 9 missing ping-foundation MT and ping-software anchor paths
- evals/results/2026-06-02: Layer 1 eval results — all 6 skills PASS (90%+ on all dimensions)

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…gistration+MFA use case

- davinci-overview.md: rebuilt from scratch — logical operators table, node visual types,
  connector instance model, variable scopes, versioning (Save/Deploy/Revert), all 3 invocation
  methods (redirect/widget/API), redirect integration steps; URL fixed to /davinci/davinci_introduction.html
- davinci-flow-patterns.md: URL fixed to /davinci/flows/davinci_flows.html; sources updated
- davinci-registration-and-mfa.md: NEW — complete registration+email-verification flow,
  inline vs deferred MFA enrollment, risk step-up with PingOne Protect init/evaluate/update,
  error handling patterns, subflow extraction recommendations
- SKILL.md (ping-orchestration): new DaVinci routing row for registration+MFA use case
- All 17 PingOne ST anchors: last_updated bumped to 2026-06-02
- index.json: added davinci-registration-and-mfa.md path
- evals/results: Layer 1 all 6 PASS (same known near-misses T-07, T-09)

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…onfig, URL fixes

- protect-configuration.md: NEW — 14 named predictors with license tier (Risk vs Protect),
  risk policy types (Standard/Targeted), Signals SDK integration by platform, bot/AI agent
  detection note, configuration checklist, PingID Device Trust predictor
- verify-configuration.md: NEW — all verification types (GovernmentID, Facial Comparison,
  Liveness, Voice, Data-Based, DigiLocker), full policy field table, transaction lifecycle
  with all statuses including REQUIRES_REVIEW handling, IDA claims storage, timeout planning
- All 4 existing anchors: last_updated → 2026-06-03; source URLs fixed (were all 404)
- SKILL.md (ping-universal-services): routing rows for protect-configuration and verify-configuration
- index.json: added protect-configuration and verify-configuration paths
- evals/results: Layer 1 all 6 PASS

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
… Swift 6 notes, migration workflow

- SKILL.md: added companion SDK skills table routing to pingidentity/ping-sdk-agent-skills
  (7 specialist skills: android, ios, reactjs-journey, reactjs-davinci, javascript, sdk-router, migration)
- mobile-integration-basics.md: added full DaVinci collector type table (TextCollector, PasswordCollector,
  SubmitCollector, FlowCollector, SelectCollector, SsoCollector, QrCodeCollector, PhoneCollector,
  auto-advancing: ProtectCollector, DeviceAuthenticatorCollector); Swift 6 actor isolation notes
- web-integration-basics.md: extended Journey callback table to 21 types (added PollingWaitCallback,
  MetadataCallback, SuspendedTextOutputCallback, SelectIdPCallback, IdPCallback, KbaCreateCallback,
  ReCaptchaCallback, WebAuthn* callbacks, attribute input callbacks); added DaVinci collector section
- integration-troubleshooting-basics.md: added 9-phase automated migration workflow from
  forgerock-to-ping-journey-migration skill (comment markers, pre/post-flight build checks,
  never-delete-silently principle); added SDK skills reference
- All 4 anchors: last_updated → 2026-06-03; all dead /r/en-us/ URLs fixed
- evals/results: Layer 1 all 6 PASS

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…r, AIC AI Agents, CIBA, bot detection

- identity-for-ai-overview.md: expanded from 3 buckets to 5 pillars (Agent Identity, Agent Security,
  Agent Gateway, Agent Detection, Verified Trust + AI App Auth); new routing table with MCP/gateway
  entries; products-by-pillar section with AIC Rapid channel note; slug + sources fixed
- agent-security-patterns.md: Pattern 0 — agent registration (PingOne AI Agents feature,
  AIC /aiagent/register DCR endpoint, Rapid-channel availability note); Pattern 6 — CIBA human-in-the-loop
  approvals with CIBA flow + PingFederate/AIC support; Pattern 7 — bot/agentic AI detection via
  Protect predictor; sources fixed
- agent-gateway-mcp.md: NEW — PingGateway Agent Gateway module; McpAuditFilter/McpProtectionFilter/
  McpValidationFilter; RFC 8707 resource indicator requirement; version matrix (2026.3.0 current,
  2025.11.2 maintenance); OAuth AS choices (AIC/PingOne/PingFederate); architecture diagram;
  Cloudflare + AWS Bedrock variants; Evolving stability warning
- verified-trust-overview.md: slug + DaVinci applications URL fixed; last_updated bumped
- workforce-helpdesk-ai.md: slug + source URLs fixed; last_updated bumped
- SKILL.md: added MCP/gateway, CIBA, bot detection routing rows
- index.json: added agent-gateway-mcp.md path
- evals/results: Layer 1 all 6 PASS (ping-foundation now 100%; T-09 only remaining near-miss)

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
- scripts/build_reference_manifests.py: scans curated anchors, scores by
  canonical/doc_type/recency/slug/status, generates top-N.json per branch;
  supports --dry-run; validates against reference-manifest-schema.json
- .github/workflows/build-reference-manifests.yml: triggers on curated anchor
  changes or builder script changes; validates all manifests post-build;
  auto-commits on main pushes

Generated manifests (14 total across 6 skills):
  ping-quickstart:         cross-platform/top-15.json     (6 docs)
  ping-foundation:         pingone-mt/top-25.json          (5 docs)
                           pingone-st/top-25.json          (5 docs)
                           ping-software/top-25.json       (3 docs)
                           cross-platform/top-10.json      (4 docs)
  ping-orchestration:      pingone-mt/top-25.json          (3 docs)
                           pingone-st/top-25.json          (17 docs)
                           ping-software/top-25.json       (0 — no curated anchors)
                           cross-platform/top-10.json      (0 — no curated anchors)
  ping-universal-services: cross-platform/top-20.json     (6 docs)
  ping-app-integration:    cross-platform/top-20.json     (4 docs)
  ping-identity-for-ai:   ai-identity/top-20.json         (5 docs)

PLAN.md: Phase 2 marked complete; Phase 3 marked next

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Adds project CLAUDE.md, Helix setup guide and DaVinci workflow doc,
superpowers Phase 0 restructure plan, .gitignore (excludes Playwright
session logs), and updated .claude/settings.local.json with sandbox
config and new permission entries.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
Re-evaluated all 6 umbrella skills against the 9 Getting Started 0–30
use cases in the Ping Agent Skill Strategy doc. Closed 4 gaps:

- ping-orchestration: passkeys-and-passwordless.md (cross-platform)
  friction tiers (low / balanced / higher-assurance), registration
  patterns A/B/C, auth patterns 1–4, recovery, AIC + DaVinci +
  PingFederate node/connector mapping.

- ping-foundation: pingone-mt/themes-and-branding.md
  branding, custom domain, email/SMS templates, senders, DaVinci UI
  Studio, CSP, multi-brand patterns, pre-go-live checklist.

- ping-quickstart: end-to-end-validation.md (cross-cutting)
  validation matrix per layer, automatable vs manual today, test-user
  patterns, validation sequence, repeatable reporting format,
  sandbox/PII guardrails. Routing trigger added to SKILL.md.

- ping-app-integration: server-side-integration-basics.md
  confidential clients, JWT/introspection, refresh rotation, M2M
  client_credentials, CIBA, Token Exchange, retry/429/circuit-breaker,
  multi-environment.

SKILL.md routing tables updated for all four. Manifest builder rebuilt
all 14 generated shortlists; index.json updated. All SKILL.md files at
≤120 lines.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Layer 1 live run (Bedrock, claude-sonnet-4-6 [1m]) PASS for all 6
skills with new prompts included:

| Skill                     | Trigger | Non-trig | Ambig |
|---------------------------|---------|----------|-------|
| ping-app-integration      | 100%    | 100%     | 100%  |
| ping-foundation           | 100%    | 100%     | 100%  |
| ping-identity-for-ai      | 100%    | 100%     | 100%  |
| ping-orchestration        | 93%     | 100%     | 100%  |
| ping-quickstart           | 92%     | 100%     | 100%  |
| ping-universal-services   | 100%    | 100%     | 100%  |

Bar: 90% / 90% / 80%. All above bar.

Added T-50–T-55 strategy use case prompts across:
- ping-quickstart (uc-1, uc-9 ×2)
- ping-foundation (uc-2 ×2, uc-3 ×2, uc-8 ×2)
- ping-orchestration (uc-4 ×2, uc-5 ×2, uc-7)
- ping-app-integration (uc-6 ×4: web / mobile / server-side / M2M)
- ping-universal-services (uc-5: Protect, Verify)

Bug fix: run_eval._build_adapter() now skips composition.yaml when
loading mock rules — composition.yaml uses a different schema and
crashed the mock adapter builder.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Phase 1 + 2 complete — 6 umbrella skills, 58 curated anchors, 14
generated manifests, CI builder, eval framework with live Bedrock
Layer 1 PASS (6/6 skills above 90% bar).

Resolves LICENSE conflict: keep Ping Identity copyright from main.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
@george-bafaloukas-forgerock george-bafaloukas-forgerock deleted the pr/skills-refactoring branch June 3, 2026 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants