Skip to content

feat(xtest): otdf-local multi-instance refactor#452

Open
dmihalcik-virtru wants to merge 17 commits into
mainfrom
DSPX-3302-03-multi-instance
Open

feat(xtest): otdf-local multi-instance refactor#452
dmihalcik-virtru wants to merge 17 commits into
mainfrom
DSPX-3302-03-multi-instance

Conversation

@dmihalcik-virtru

@dmihalcik-virtru dmihalcik-virtru commented May 15, 2026

Copy link
Copy Markdown
Member

Summary

Refactors otdf-local from a single-instance CLI to a multi-instance harness. Each named instance under tests/instances/<name>/ owns its own opentdf.yaml, keys, KAS configs, and port range, and references platform binaries managed by otdf-sdk-mgr (PR #451).

Settings — gains instance_name, instance_dir, instances_root. Per-instance paths activate when instance.yaml exists; legacy behavior is preserved without it.

Ports — parameterize on instance.ports.base via a KAS_OFFSETS table so two instances on different bases coexist.

ServicesPlatformService / KASService use the pinned xtest/platform/dist/<dist>/service binary when an instance is loaded; go run ./service path runs unchanged otherwise. KAS features (ec_tdf_enabled, etc.) come from instance.yaml.

New CLI surface:

  • Top-level --instance NAME
  • otdf-local instance init <name> [--from-scenario PATH] [--ports-base N] [--platform DIST] — scaffolds directory, auto-generates keys and opentdf.yaml with a fresh root key
  • otdf-local instance ls [--json], otdf-local instance rm <name> -y
  • otdf-local scenario run <path> — translates scenario suite block to pytest args

Other:

  • otdf-local/pyproject.toml declares otdf-sdk-mgr as a uv workspace dependency
  • .gitignore covers /instances/, xtest/scenarios/*.installed.json, .claude/tmp/
  • 5 new unit tests in test_multi_instance.py

Test plan

  • cd otdf-local && uv run pytest tests/ -m 'not integration' — 27 passing
  • uv run otdf-local instance init demo --from-scenario <path> — directory layout correct
  • uv run otdf-local instance ls --json — enumerates instance
  • uv run pyright — 0 errors

Jira: https://virtru.atlassian.net/browse/DSPX-3302

🤖 Generated with Claude Code

Stack (a60d3302):

Generated by wgo stack. Edit text above or below this block, not inside it.

Summary by CodeRabbit

  • New Features
    • Multi-instance CLI: init, list, and remove instances; select active instance via --instance.
    • Scenario runner: run scenarios.yaml against a selected instance.
  • Improvements
    • Services (platform/KAS) and Docker commands now use per-instance ports, config, and environment.
    • Instance-pinned binaries/config are used when available.
    • Enhanced key setup for multi-instance mode, including localhost TLS + ca.jks.
  • Tests
    • Added scenario→pytest argument translation tests and multi-instance smoke tests.
  • Chores
    • Updated .gitignore and local dependency configuration.

@coderabbitai

coderabbitai Bot commented May 15, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds multi-instance support across settings, CLI commands, services, key generation, and tests. Instance manifests now drive port selection, filesystem layout, and pinned binaries, with legacy platform behavior retained as a fallback.

Changes

Multi-instance test harness

Layer / File(s) Summary
Dependency and gitignore configuration
otdf-local/pyproject.toml, .gitignore
Adds otdf-sdk-mgr as a runtime dependency with a local source override and ignores instance/test-harness artifacts.
Port arithmetic with base offset
otdf-local/src/otdf_local/config/ports.py
Replaces fixed KAS port constants with base-offset port calculation and exposes platform_port_for().
Settings refactor for multi-instance awareness
otdf-local/src/otdf_local/config/settings.py
Adds instance selection, instance-scoped paths, optional platform-dir handling, instance-aware port/config resolution, and directory creation for instance layouts.
Root CLI wiring and instance option
otdf-local/src/otdf_local/cli.py
Registers new subcommands, adds --instance, updates readiness checks, and conditions environment exports on platform-dir availability.
Instance management CLI (init, ls, rm)
otdf-local/src/otdf_local/cli_instance.py
Introduces instance creation, listing, and removal commands plus instance-name validation and port-collision warnings.
Scenario execution CLI
otdf-local/src/otdf_local/cli_scenario.py
Adds scenario execution support that builds pytest arguments from scenario suites and runs them under the selected instance.
Docker service instance environment
otdf-local/src/otdf_local/services/docker.py
Passes per-instance compose environment variables into docker-compose subprocess calls.
KAS service instance pinning and config
otdf-local/src/otdf_local/services/kas.py
Uses instance-aware ports, instance pins, pinned binaries/worktrees, and instance-scoped KAS management.
Platform service instance pinning and config
otdf-local/src/otdf_local/services/platform.py
Uses instance-aware ports and worktrees, patches existing config in place, and starts pinned binaries when present.
Key and certificate generation for TLS and truststore
otdf-local/src/otdf_local/utils/keys.py
Adds localhost TLS and JKS generation, expands required key artifacts, and writes golden keys with absolute paths.
Scenario-to-pytest argument translation tests
otdf-local/tests/test_cli_scenario.py
Covers scenario-suite translation into pytest argv, including targets, containers, markers, and SDK tokens.
Multi-instance infrastructure smoke tests
otdf-local/tests/test_multi_instance.py
Covers port offsets, instance detection, instance loading, and instance-scoped paths.

Estimated code review effort: 4 (Complex) | ~60 minutes

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant cli_scenario
  participant settings
  participant pytest
  User->>cli_scenario: run scenario.yaml with optional --instance
  cli_scenario->>settings: set OTDF_LOCAL_INSTANCE_NAME
  cli_scenario->>settings: clear cache and load xtest_root
  cli_scenario->>pytest: execute `uv run pytest` with built args
Loading

Possibly related PRs

  • opentdf/tests#450: Provides shared otdf_sdk_mgr.schema models used for instance and scenario loading in this change.
  • opentdf/tests#451: Introduces scenario install artifacts that this PR now ignores and consumes.
  • opentdf/tests#427: Touches key generation and Docker/keytool handling adjacent to the new per-instance key flow.

Suggested reviewers: pflynn-virtru

Poem

🐰 I hopped by ports and burrows neat,
With instance maps beneath my feet.
Keys and scenarios lined up just so,
Then pytest ran — hop, hop, go!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.58% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title is concise and accurately reflects the main change: a multi-instance refactor for otdf-local.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch DSPX-3302-03-multi-instance

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a multi-instance refactor for the otdf-local CLI, enabling the management of isolated test environments. It introduces new instance and scenario subcommands, updates the configuration system to be instance-aware, and integrates with otdf-sdk-mgr for binary management. Service launchers for KAS and the platform now support per-instance port offsets and directory structures. Review feedback highlights a potential TypeError in KAS feature handling and suggests a more direct approach for updating Pydantic model metadata.

Comment thread otdf-local/src/otdf_local/services/kas.py Outdated
Comment thread otdf-local/src/otdf_local/cli_instance.py Outdated

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a multi-instance architecture for the otdf-local CLI, allowing for the management and execution of isolated test environments. Key updates include new subcommands for instance and scenario handling, offset-based port allocation, and instance-specific directory structures for logs and configurations. Feedback from the review suggests several improvements: adding a null check for KAS features to avoid runtime errors, using Pydantic's model_copy for cleaner metadata updates, adopting shlex.join for safer command display, and adding missing type hints to enhance code maintainability.

Comment thread otdf-local/src/otdf_local/services/kas.py Outdated
Comment thread otdf-local/src/otdf_local/cli_instance.py Outdated
Comment thread otdf-local/src/otdf_local/cli_scenario.py Outdated
Comment thread otdf-local/src/otdf_local/config/settings.py Outdated

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a multi-instance test harness capability, allowing for the management and execution of isolated OpenTDF environments with distinct configurations, port ranges, and platform versions. Key additions include new CLI subcommands for instance management (init, ls, rm) and scenario execution, an instance-aware settings system, and integration with otdf-sdk-mgr to resolve versioned binaries. Feedback identifies a critical issue where the up command still relies on static port constants, which will break health checks for non-default instances. Additionally, improvements were suggested regarding safer dictionary handling for KAS features and more idiomatic use of Pydantic's model_copy.

Comment thread otdf-local/src/otdf_local/cli.py
Comment thread otdf-local/src/otdf_local/cli_instance.py Outdated
Comment thread otdf-local/src/otdf_local/services/kas.py Outdated
@github-actions

Copy link
Copy Markdown

@dmihalcik-virtru dmihalcik-virtru force-pushed the DSPX-3302-02-platform-installer branch from c6a7895 to ebc0c15 Compare May 15, 2026 16:35
@dmihalcik-virtru dmihalcik-virtru force-pushed the DSPX-3302-03-multi-instance branch from c69afd6 to a8ef24a Compare May 15, 2026 16:36
@dmihalcik-virtru dmihalcik-virtru force-pushed the DSPX-3302-02-platform-installer branch from ebc0c15 to 14e5c1e Compare May 15, 2026 16:57
@dmihalcik-virtru dmihalcik-virtru force-pushed the DSPX-3302-03-multi-instance branch from a8ef24a to 78b2ca6 Compare May 15, 2026 16:58
dmihalcik-virtru added a commit that referenced this pull request May 21, 2026
#450)

## Summary

First PR in a five-part stack that introduces a multi-instance test
harness and a Claude plugin for OpenTDF bug reproduction. This PR adds
*only* the shared Pydantic schema in `otdf-sdk-mgr` — no consumers yet.

- Adds `otdf_sdk_mgr.schema` with v2 models: `Scenario`, `Instance`,
`PlatformPin`, `KasPin`, `SdkPin`, `ScenarioSdks`, `Suite`, etc.
- `ScenarioSdks.encrypt` / `.decrypt` mirror xtest's existing
`--sdks-encrypt` / `--sdks-decrypt` convention so a→b-only scenarios are
first-class.
- `python -m otdf_sdk_mgr.schema validate <path>` validates either a
Scenario or an Instance file based on its `kind:`.
- Adds `pydantic` + `ruamel.yaml` to `otdf-sdk-mgr/pyproject.toml`.
- 6 unit tests covering round-trips, pin invariants, and unknown-field
rejection.

## Stack

1. [**This PR**](#450) — Shared
schema
2. [Platform installer + `install
scenario`](#451) in `otdf-sdk-mgr`
(builds on this)
3. `otdf-local` [multi-instance
refactor](#452) + new CLI
subcommands
4. `xtest/conftest.py`
[integration](#453) (`--scenario`,
`--instance`)
5. [Claude plugin](#454)
(`.claude/skills/`, settings, plugin manifest)
6. #455

## Test plan

- [x] `cd otdf-sdk-mgr && uv run pytest tests/test_schema.py` — all 6
pass
- [x] `uv run python -m otdf_sdk_mgr.schema validate <path>` accepts a
valid scenarios.yaml and rejects unknown fields

Jira: https://virtru.atlassian.net/browse/DSPX-3302

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added schema validation for OpenTDF Scenario and Instance YAML
configurations with a new CLI command.
* Introduced strict validation with cross-field constraints for SDK and
platform configurations.

* **Documentation**
  * Updated supported container formats from `nano` to `ztdf-ecwrap`.

* **Dependencies**
* Updated core package dependencies to support enhanced validation
capabilities.

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/opentdf/tests/pull/450?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

<!-- review_stack_entry_end -->

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
@dmihalcik-virtru dmihalcik-virtru force-pushed the DSPX-3302-03-multi-instance branch from 78b2ca6 to e196e43 Compare May 21, 2026 15:38
@dmihalcik-virtru dmihalcik-virtru force-pushed the DSPX-3302-02-platform-installer branch from 14e5c1e to 9993b12 Compare May 21, 2026 15:38
@github-actions

Copy link
Copy Markdown

X-Test Failure Report

@github-actions

Copy link
Copy Markdown

@dmihalcik-virtru dmihalcik-virtru force-pushed the DSPX-3302-02-platform-installer branch from ec1f655 to 13b5c96 Compare May 22, 2026 01:46
@dmihalcik-virtru dmihalcik-virtru force-pushed the DSPX-3302-03-multi-instance branch 2 times, most recently from 5b1c928 to a1bcecc Compare May 22, 2026 13:50
@github-actions

Copy link
Copy Markdown

@github-actions

Copy link
Copy Markdown

X-Test Failure Report

@dmihalcik-virtru dmihalcik-virtru force-pushed the DSPX-3302-03-multi-instance branch from e7d13f5 to 6832d58 Compare May 28, 2026 12:46
@github-actions

Copy link
Copy Markdown

X-Test Failure Report

@github-actions

Copy link
Copy Markdown

X-Test Failure Report

@dmihalcik-virtru dmihalcik-virtru force-pushed the DSPX-3302-03-multi-instance branch from 6d83353 to b441b38 Compare June 10, 2026 18:37
@github-actions

Copy link
Copy Markdown

X-Test Failure Report

@sonarqubecloud

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
otdf-local/src/otdf_local/config/ports.py (1)

1-60: ⚠️ Potential issue | 🔴 Critical

Run the required Python quality gates for otdf-local (pyright is missing)
ruff check and ruff format --check passed for otdf-local/, but pyright otdf-local did not run because pyright is not found (/bin/bash: pyright: command not found). Ensure pyright is installed/available and rerun the quality gates before committing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@otdf-local/src/otdf_local/config/ports.py` around lines 1 - 60, CI failed
because the Pyright type checker is not installed/runnable, so update the repo
so `pyright` is available and the quality gate runs; install Pyright as a
project/tooling dependency (e.g., add to repo dev dependencies or install via
npm/yarn in the CI image) or ensure the CI runner has Pyright on PATH, then
re-run the type checks (verify it covers otdf_local.config.ports.Ports and its
methods like get_kas_port, platform_port_for, all_kas_names, standard_kas_names,
km_kas_names, is_km_kas) and fix any type errors reported.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@otdf-local/src/otdf_local/config/ports.py`:
- Around line 30-35: The get_kas_port function lets any integer base be passed
so base + offset can fall outside valid TCP port range; update get_kas_port (and
use KAS_OFFSETS) to validate that base is an int within 1..65535 and that
computed_port = base + offset is also within 1..65535 and raise a ValueError
with a clear message showing the invalid base or computed_port when out of
range; perform these checks before returning the port so callers fail fast with
an informative error.

In `@otdf-local/src/otdf_local/services/kas.py`:
- Around line 60-68: KASService._instance_paths currently only handles
KasPin.dist and returns None for source-pinned KAS; update _instance_paths to
also check KasPin.source and resolve it the same way platform/source pins are
handled in cli_instance.py: if pin.dist use
self.settings.resolve_binary_worktree(pin.dist), else if pin.source resolve the
pinned paths via the same resolver used for platform/source pins (the code path
in cli_instance.py), returning the resolved (binary, worktree) tuple instead of
None. Ensure you reference KasPin.source and KasPin.dist inside
KASService._instance_paths and call the appropriate settings resolver
consistently.

---

Outside diff comments:
In `@otdf-local/src/otdf_local/config/ports.py`:
- Around line 1-60: CI failed because the Pyright type checker is not
installed/runnable, so update the repo so `pyright` is available and the quality
gate runs; install Pyright as a project/tooling dependency (e.g., add to repo
dev dependencies or install via npm/yarn in the CI image) or ensure the CI
runner has Pyright on PATH, then re-run the type checks (verify it covers
otdf_local.config.ports.Ports and its methods like get_kas_port,
platform_port_for, all_kas_names, standard_kas_names, km_kas_names, is_km_kas)
and fix any type errors reported.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ee88be39-2115-41a7-9e50-e11592cdec3f

📥 Commits

Reviewing files that changed from the base of the PR and between 200f430 and b441b38.

⛔ Files ignored due to path filters (1)
  • otdf-local/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • .gitignore
  • otdf-local/pyproject.toml
  • otdf-local/src/otdf_local/cli.py
  • otdf-local/src/otdf_local/cli_instance.py
  • otdf-local/src/otdf_local/cli_scenario.py
  • otdf-local/src/otdf_local/config/ports.py
  • otdf-local/src/otdf_local/config/settings.py
  • otdf-local/src/otdf_local/services/docker.py
  • otdf-local/src/otdf_local/services/kas.py
  • otdf-local/src/otdf_local/services/platform.py
  • otdf-local/src/otdf_local/utils/keys.py
  • otdf-local/tests/test_cli_scenario.py
  • otdf-local/tests/test_multi_instance.py
✅ Files skipped from review due to trivial changes (1)
  • .gitignore
🚧 Files skipped from review as they are similar to previous changes (8)
  • otdf-local/src/otdf_local/services/docker.py
  • otdf-local/src/otdf_local/cli_scenario.py
  • otdf-local/src/otdf_local/services/platform.py
  • otdf-local/tests/test_cli_scenario.py
  • otdf-local/pyproject.toml
  • otdf-local/src/otdf_local/cli.py
  • otdf-local/src/otdf_local/utils/keys.py
  • otdf-local/src/otdf_local/config/settings.py

Comment on lines +30 to +35
def get_kas_port(cls, name: str, *, base: int = 8080) -> int:
offset = cls.KAS_OFFSETS.get(name)
if offset is None:
raise ValueError(f"Unknown KAS instance: {name}")
return getattr(cls, attr)
return base + offset

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate base and computed KAS ports are within valid TCP range.

get_kas_port accepts any integer base, so base + offset can become <1 or >65535, which pushes failure to later service startup instead of failing fast here.

Proposed fix
 `@classmethod`
 def get_kas_port(cls, name: str, *, base: int = 8080) -> int:
+    if not (1 <= base <= 65535):
+        raise ValueError(f"Invalid base port: {base}")
     offset = cls.KAS_OFFSETS.get(name)
     if offset is None:
         raise ValueError(f"Unknown KAS instance: {name}")
-    return base + offset
+    port = base + offset
+    if not (1 <= port <= 65535):
+        raise ValueError(f"Computed port out of range for {name}: {port}")
+    return port
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@otdf-local/src/otdf_local/config/ports.py` around lines 30 - 35, The
get_kas_port function lets any integer base be passed so base + offset can fall
outside valid TCP port range; update get_kas_port (and use KAS_OFFSETS) to
validate that base is an int within 1..65535 and that computed_port = base +
offset is also within 1..65535 and raise a ValueError with a clear message
showing the invalid base or computed_port when out of range; perform these
checks before returning the port so callers fail fast with an informative error.

Comment on lines +60 to +68
def _instance_paths(self) -> tuple[Path, Path] | None:
"""Return (binary, worktree) for an instance-pinned KAS, or None."""
instance = self.settings.load_instance()
if instance is None:
return None
pin = instance.kas.get(self._kas_name)
if pin is None or pin.dist is None:
return None
return self.settings.resolve_binary_worktree(pin.dist)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check KasPin schema definition to see if it supports source field
ast-grep --pattern $'class KasPin($_):
  $$$
'
# Also check if there's a source field in KasPin or its parents
rg -n "class KasPin" -A 30

Repository: opentdf/tests

Length of output: 4541


Handle KasPin.source in KASService._instance_paths
KasPin defines both dist and source and enforces that exactly one of them is set. But otdf-local/src/otdf_local/services/kas.py (lines 60-68) only resolves when pin.dist is present, returning None when pin.dist is None, so source-pinned KAS instances can’t resolve their pinned binary/worktree. Update _instance_paths to also resolve via pin.source (similar to how cli_instance.py handles platform pins).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@otdf-local/src/otdf_local/services/kas.py` around lines 60 - 68,
KASService._instance_paths currently only handles KasPin.dist and returns None
for source-pinned KAS; update _instance_paths to also check KasPin.source and
resolve it the same way platform/source pins are handled in cli_instance.py: if
pin.dist use self.settings.resolve_binary_worktree(pin.dist), else if pin.source
resolve the pinned paths via the same resolver used for platform/source pins
(the code path in cli_instance.py), returning the resolved (binary, worktree)
tuple instead of None. Ensure you reference KasPin.source and KasPin.dist inside
KASService._instance_paths and call the appropriate settings resolver
consistently.

@github-actions

Copy link
Copy Markdown

dmihalcik-virtru and others added 17 commits July 2, 2026 17:25
Refactors otdf-local from a single-instance CLI (one platform checkout,
fixed ports, hardcoded six KAS instances) into a multi-instance harness
where each named instance under tests/instances/<name>/ owns its own
opentdf.yaml, keys, KAS configs, and port range.

Why
---

A single bug report often describes a *combination* — platform v0.9.0
with Java SDK 0.7.8 and a KAS at a pre-release. Today a developer has
to hand-edit configs and re-checkout the platform to reproduce. After
this change:

  otdf-local instance init java-078 --from-scenario .../scenario.yaml
  otdf-local --instance java-078 up

brings up exactly the topology the scenario describes, using platform
binaries that otdf-sdk-mgr already provisioned (each instance, and each
KAS within an instance, can reference a different pinned version). Two
instances on disjoint ports.base can coexist on a developer laptop.

What changes
------------

otdf-local now depends on otdf-sdk-mgr via a uv path source so both
tools share the canonical Scenario/Instance schema.

Settings (otdf_local.config.settings):
  - New instance_name (env-overridable via OTDF_LOCAL_INSTANCE_NAME),
    instance_dir, instances_root, instance_yaml properties.
  - platform_dir becomes optional; legacy sibling-discovery only kicks
    in when no per-instance configuration is present.
  - platform_binary_for(dist) resolves to the otdf-sdk-mgr-managed
    xtest/platform/dist/<dist>/service binary.
  - keys_dir, logs_dir, config_dir, platform_config, and
    get_kas_config_path switch to per-instance paths whenever
    instance.yaml exists; legacy behavior is preserved otherwise.
  - load_instance() reads the per-instance manifest via the shared
    Pydantic model.

Ports (otdf_local.config.ports):
  - KAS_OFFSETS exposes the offset table (alpha=+101, beta=+202, ...,
    km2=+606) so multiple instances on different bases get disjoint
    port ranges. The legacy 8080-based constants are preserved as
    defaults.
  - get_kas_port(name, base=...) computes the port relative to base.

Services (otdf_local.services.platform / .kas):
  - PlatformService.start() and KASService.start() use the pinned dist
    binary at xtest/platform/dist/<dist>/service when an instance is
    loaded, with cwd set to the recorded worktree so the binary finds
    its embedded resources. Legacy `go run ./service` path runs
    unchanged when no instance is active.
  - KASService.is_key_management defers to the manifest's `mode` field
    instead of the legacy name-based heuristic; per-KAS features (e.g.
    ec_tdf_enabled) pass through to opentdf.yaml.
  - KASManager constructs only the KAS instances listed in
    instance.yaml's kas: map. start_standard / start_km filter on
    is_key_management so subset topologies still work.

utils.keys.setup_golden_keys:
  - Writes key files into the target directory (per-instance keys_dir
    or legacy platform_dir) and uses absolute paths in the generated
    keys_config so the binary finds them regardless of cwd.

CLI:
  - New top-level --instance option threads through every command via
    OTDF_LOCAL_INSTANCE_NAME.
  - New `instance` subcommand group: init [--from-scenario PATH],
    ls --json, rm.
  - New `scenario` subcommand: `run <path>` translates the scenario's
    suite block into `pytest --sdks-encrypt ... --sdks-decrypt ...
    --containers ...` under xtest/ with OTDF_LOCAL_INSTANCE_NAME set.

Tests (otdf-local/tests/test_multi_instance.py):
  - Port arithmetic at default and alternate bases.
  - Settings round-trip with and without an instance.yaml.
  - platform_binary_for resolves under the otdf-sdk-mgr-managed
    xtest/platform/ tree.

.gitignore additions:
  - tests/instances/ (per-instance config and logs)
  - xtest/scenarios/*.installed.json (provisioning records)
  - .claude/tmp/

Backward compatibility:
  - `otdf-local up` with no --instance flag keeps working against a
    sibling platform/ checkout.

Refs: https://virtru.atlassian.net/browse/DSPX-3302

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Before this change, `otdf-local instance init` only wrote `instance.yaml`
and empty subdirs. Anyone running a fresh instance had to manually copy
keys from another worktree, run `init-temp-keys.sh` by hand, and copy
`opentdf-dev.yaml` into the instance dir before `up` would succeed —
otherwise Keycloak crash-looped on a missing `truststore.jks`, and
pytest failed with `OT_ROOT_KEY environment variable is not set`.

Changes:
- utils/keys.py: add `generate_localhost_cert()` and `generate_ca_jks()`
  to produce the Keycloak TLS pair + JKS truststore (matches the
  platform's `init-temp-keys.sh`). `generate_ca_jks()` runs `keytool`
  inside the `keycloak/keycloak:25.0` image so a local JDK isn't
  required. `ensure_keys_exist()` now generates the full bootstrap
  bundle, idempotently.
- cli_instance.py: `_init_from_scenario` and `_init_minimal` call a new
  `_provision_instance_dir()` helper that runs `ensure_keys_exist()` and
  copies the platform's `opentdf-dev.yaml` (or `opentdf-example.yaml`)
  into the instance dir, overriding `services.kas.root_key` with a
  freshly generated value so every instance owns its own root key.
- services/platform.py: `_generate_config()` preserves an existing
  per-instance `opentdf.yaml`, only patching logger + golden-key fields
  in place, so the init-time `root_key` survives restarts.
- services/docker.py: docker-compose subprocesses are now run with
  `KEYS_DIR=<instance>/keys` so the compose file's `${KEYS_DIR:-./keys}`
  mounts resolve to the per-instance bundle.

Users can now run:

  otdf-local instance init <name> --from-scenario path/to/scenario.yaml
  otdf-local --instance <name> up
  eval $(otdf-local --instance <name> env)
  cd xtest && uv run pytest ...

with no manual key-copying, no editing of `opentdf.yaml`, and no
shell-script fallback. Verified end-to-end against `pure-mlkem.yaml`
(PR opentdf/platform#3537): all 9 services come up healthy on the first
try and `env` exports `OT_ROOT_KEY`.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
…chema

`_build_pytest_args` read `suite.select` and treated `suite.containers`
as a string, but the Pydantic Suite model exposes `targets: list[str]`
and `containers: list[ContainerKind]`. Any user invoking
`otdf-local scenario run` hit AttributeError. Also wires `suite.kexpr`
through as `-k`; it was silently dropped.

Adds unit tests covering empty/multi targets, container join, kexpr,
markers + extra args, and SDK token forwarding.

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
…leanup

- `up` command now uses `settings.get_platform_port()` and iterates
  `kas_manager._instances` with `settings.get_kas_port()` for health checks
  so non-default instances with a different `ports.base` work correctly
- Add `Settings.get_platform_port()` alongside the existing `get_kas_port()`
- Simplify metadata name update: `instance.metadata.name = name` (frozen=False)
- Use `shlex.join(cmd)` for display in cli_scenario.py
- Add `"Instance | None"` return type to `load_instance` via TYPE_CHECKING
- Drop unused `Path` import in cli.py, stale `os` import in test file

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Guard platform_dir None-access in env command; replace non-existent
PlatformPin.image attribute with "unknown" fallback in ls command.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
- cli_scenario: set OTDF_LOCAL_INSTANCE_NAME + clear settings cache before
  get_settings() so scenario-driven instance name is picked up
- cli_instance: add _validate_instance_name() to guard against path traversal
  in init/rm; add --force flag to init to prevent silent overwrite
- kas: add get_instance_names() public method; replace _instances access in cli
- keys: generate_ca_jks() now imports cert only (keytool -importcert) so ca.jks
  is a proper truststore; ensure_keys_exist() guards include cert files alongside
  private keys to catch partial-init broken state

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Reverts the keytool -importcert change from the previous commit.
The PKCS12 + importkeystore approach mirrors init-temp-keys.sh in the
platform repo exactly (lines 65-90); Keycloak requires this form of
ca.jks and the cert-only truststore broke it.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@dmihalcik-virtru dmihalcik-virtru force-pushed the DSPX-3302-03-multi-instance branch from b441b38 to 327f045 Compare July 2, 2026 21:25
@sonarqubecloud

sonarqubecloud Bot commented Jul 2, 2026

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
otdf-local/src/otdf_local/utils/keys.py (1)

311-354: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

ensure_keys_exist still doesn't track keycloak-ca.pem, and the CA/JKS regeneration coupling is broken.

A prior review flagged that keycloak-ca.pem isn't included in the existence checks — that fix was never applied here (ca_cert is absent from both the fast-path guard at Lines 334-344 and the generate_localhost_cert guard at Line 350). This is worse than just a FileNotFoundError risk: generate_localhost_cert() always regenerates a brand-new self-signed CA on every call (Lines 166-186). So if only localhost.key/localhost.crt are missing (e.g., partial corruption) while ca.jks still exists, Line 352 skips regenerating ca.jks — leaving it built from the old CA while the new leaf cert is signed by a new CA. Keycloak's mounted truststore (ca.jks) would then no longer trust the freshly generated leaf cert, causing silent TLS trust failures instead of an obvious error.

🔒 Proposed fix: track ca_cert and couple CA/JKS regeneration
     localhost_key = key_dir / "localhost.key"
     localhost_cert = key_dir / "localhost.crt"
+    ca_cert = key_dir / "keycloak-ca.pem"
     ca_jks = key_dir / "ca.jks"

     if (
         not force
         and rsa_private.exists()
         and rsa_cert.exists()
         and ec_private.exists()
         and ec_cert.exists()
         and localhost_key.exists()
         and localhost_cert.exists()
+        and ca_cert.exists()
         and ca_jks.exists()
     ):
         return False

     if force or not rsa_private.exists() or not rsa_cert.exists():
         generate_rsa_keypair(key_dir, "kas")
     if force or not ec_private.exists() or not ec_cert.exists():
         generate_ec_keypair(key_dir, "kas-ec")
-    if force or not localhost_key.exists() or not localhost_cert.exists():
+    ca_regenerated = False
+    if (
+        force
+        or not localhost_key.exists()
+        or not localhost_cert.exists()
+        or not ca_cert.exists()
+    ):
         generate_localhost_cert(key_dir)
-    if force or not ca_jks.exists():
+        ca_regenerated = True
+    if force or ca_regenerated or not ca_jks.exists():
         generate_ca_jks(key_dir)
     return True
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@otdf-local/src/otdf_local/utils/keys.py` around lines 311 - 354,
ensure_keys_exist currently misses keycloak-ca.pem in its existence checks, and
regenerate_localhost_cert() can create a new CA without forcing ca.jks to be
rebuilt. Update ensure_keys_exist so the fast-path guard and the localhost cert
regeneration path both account for the CA cert (using the existing
ca_cert/localhost cert symbols), and make generate_localhost_cert() return
enough signal for ensure_keys_exist to also regenerate ca_jks whenever the CA
changes. This keeps the truststore in sync with the certs and avoids stale
Keycloak trust material.
🧹 Nitpick comments (1)
otdf-local/src/otdf_local/utils/keys.py (1)

272-302: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Add a timeout to the docker run keytool call.

Unlike the pure openssl invocations, this shells out to docker run which may pull keycloak/keycloak:25.0 over the network. Without a timeout=, a slow/failed pull or unresponsive daemon will hang the CLI (and any CI job invoking instance init) indefinitely.

♻️ Proposed fix
     result = subprocess.run(
         [
             "docker",
             "run",
             ...
         ],
         capture_output=True,
         text=True,
+        timeout=120,
     )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@otdf-local/src/otdf_local/utils/keys.py` around lines 272 - 302, The
docker-based keytool invocation in the ca.p12/ca.jks conversion path can hang
indefinitely, so add an explicit timeout to the subprocess.run call in the
keys.py helper that shells out to docker run. Update the existing subprocess.run
call for the keytool importkeystore command to use a reasonable timeout and
ensure any timeout handling surfaces a clear failure, keeping the behavior
aligned with the other key-generation utilities in this module.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@otdf-local/src/otdf_local/utils/keys.py`:
- Around line 311-354: ensure_keys_exist currently misses keycloak-ca.pem in its
existence checks, and regenerate_localhost_cert() can create a new CA without
forcing ca.jks to be rebuilt. Update ensure_keys_exist so the fast-path guard
and the localhost cert regeneration path both account for the CA cert (using the
existing ca_cert/localhost cert symbols), and make generate_localhost_cert()
return enough signal for ensure_keys_exist to also regenerate ca_jks whenever
the CA changes. This keeps the truststore in sync with the certs and avoids
stale Keycloak trust material.

---

Nitpick comments:
In `@otdf-local/src/otdf_local/utils/keys.py`:
- Around line 272-302: The docker-based keytool invocation in the ca.p12/ca.jks
conversion path can hang indefinitely, so add an explicit timeout to the
subprocess.run call in the keys.py helper that shells out to docker run. Update
the existing subprocess.run call for the keytool importkeystore command to use a
reasonable timeout and ensure any timeout handling surfaces a clear failure,
keeping the behavior aligned with the other key-generation utilities in this
module.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ad3836d1-3590-4432-b36e-66341942f92f

📥 Commits

Reviewing files that changed from the base of the PR and between b441b38 and 327f045.

⛔ Files ignored due to path filters (1)
  • otdf-local/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • .gitignore
  • otdf-local/pyproject.toml
  • otdf-local/src/otdf_local/cli.py
  • otdf-local/src/otdf_local/cli_instance.py
  • otdf-local/src/otdf_local/cli_scenario.py
  • otdf-local/src/otdf_local/config/ports.py
  • otdf-local/src/otdf_local/config/settings.py
  • otdf-local/src/otdf_local/services/docker.py
  • otdf-local/src/otdf_local/services/kas.py
  • otdf-local/src/otdf_local/services/platform.py
  • otdf-local/src/otdf_local/utils/keys.py
  • otdf-local/tests/test_cli_scenario.py
  • otdf-local/tests/test_multi_instance.py
✅ Files skipped from review due to trivial changes (1)
  • .gitignore
🚧 Files skipped from review as they are similar to previous changes (9)
  • otdf-local/pyproject.toml
  • otdf-local/tests/test_multi_instance.py
  • otdf-local/src/otdf_local/config/ports.py
  • otdf-local/tests/test_cli_scenario.py
  • otdf-local/src/otdf_local/services/platform.py
  • otdf-local/src/otdf_local/cli.py
  • otdf-local/src/otdf_local/cli_instance.py
  • otdf-local/src/otdf_local/services/kas.py
  • otdf-local/src/otdf_local/config/settings.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant