diff --git a/AGENTS.md b/AGENTS.md index e001b10..6aa004f 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,6 +1,7 @@ Keep this document concise. - Core user, developer, and design docs are in-repo under fluxon_doc_cn/ and fluxon_doc_en/ - Detailed bilingual doc writing rules are indexed at `fluxon_doc_en/dev_doc/Developer - 3 - Documentation Writing Rules.md` and `fluxon_doc_cn/dev_doc/开发者 - 3 - 文档写作规约.md` +- Teststack architecture and test entry/helper design rules are indexed at [fluxon_doc_cn/design/teststack_1_当前架构与CI测试流程.md](fluxon_doc_cn/design/teststack_1_当前架构与CI测试流程.md) - teststack has two steps: start testbed and testrunner - teststack has UI support; testrunner should own the UI authority and API surface, and the UI should run as a long-lived service that reuses the ops interfaces underneath - All Python code in this project must be compatible with Python >=3.10 @@ -9,6 +10,9 @@ Keep this document concise. - Git operations are limited to basic `stage`, `unstage`, `commit`, and `push`. Do not use other Git operations. - Prefer contraction over compatibility by default. Do not add compatibility layers, deprecated paths, or aliases unless the task explicitly requires them. - Prefer one canonical name for one concept. Avoid synonym parameters, duplicated entrypoints, and parallel config surfaces. +- Do not use environment variables for ordinary parameter passing. Prefer configuration files first, then explicit command-line arguments. +- Prefer convention over configuration. When one canonical path or default wiring is sufficient, do not add extra config knobs. +- Minimize multi-path config delivery. Do not pass the same config through parallel channels such as env vars, CLI flags, and files at the same time. - For test entrypoints, match the real execution model directly. If a test is a standalone script/process test, invoke it as a script/process; do not wrap it in `pytest` just for uniformity. - Do not forward pytest-style flags (`-k`, `-q`, node selectors, etc.) through direct-process test wrappers unless the wrapper explicitly implements and documents that selector surface. - For new integration or process-lifecycle tests, prefer direct process startup with explicit arguments and explicit exit-code checks over adding new pytest-only wrappers. diff --git a/AGENTS_CN.md b/AGENTS_CN.md index 3e3c815..e44a570 100644 --- a/AGENTS_CN.md +++ b/AGENTS_CN.md @@ -1,6 +1,7 @@ 保持本文档简洁。 - 核心用户文档、开发文档和设计文档都在仓库内的 `fluxon_doc_cn/` 和 `fluxon_doc_en/` 下 - 详细的中英文文档写作规约索引见 `fluxon_doc_cn/dev_doc/开发者 - 3 - 文档写作规约.md` 和 `fluxon_doc_en/dev_doc/Developer - 3 - Documentation Writing Rules.md` +- `teststack` 架构与测试入口 / helper 设计规则索引见 [fluxon_doc_cn/design/teststack_1_当前架构与CI测试流程.md](fluxon_doc_cn/design/teststack_1_当前架构与CI测试流程.md) - `teststack` 有两个步骤:`start testbed` 和 `testrunner` - `teststack` 支持 UI;`testrunner` 应负责 UI 的 authority 和 API surface,但 UI 应作为常驻服务运行,并复用下层的 ops 接口 - 本项目所有 Python 代码都必须兼容 Python `>= 3.10` @@ -9,6 +10,9 @@ - Git 操作仅限基础的 `stage`、`unstage`、`commit` 和 `push`。不要使用其他 Git 操作 - 默认优先收束而不是兼容。除非任务明确要求,否则不要添加兼容层、废弃路径或别名 - 一个概念优先只保留一个正式名字。避免同义参数、重复入口和并行配置面 +- 普通参数传递禁止使用环境变量。优先配置文件,其次使用显式命令行参数 +- 优先约定优于配置。如果一个规范路径或默认接线已经足够,就不要再增加额外配置旋钮 +- 尽量减少多路径配置传递。不要同时通过环境变量、命令参数、文件等并行通道传递同一份配置 - 对测试入口,要直接匹配真实执行模型。如果测试本质上是独立脚本 / 独立进程测试,就按脚本 / 进程直接启动;不要为了表面统一再额外包一层 `pytest` - 对直接启动进程的测试包装器,不要透传 `-k`、`-q`、node selector 等 pytest 风格参数,除非该包装器显式实现并文档化了这组筛选接口 - 新增集成测试或进程生命周期测试时,优先采用“直接启动进程 + 显式参数 + 显式检查退出码”的模式,而不是继续新增 pytest 专用包装层 diff --git "a/fluxon_doc_cn/design/teststack_1_\345\275\223\345\211\215\346\236\266\346\236\204\344\270\216CI\346\265\213\350\257\225\346\265\201\347\250\213.md" "b/fluxon_doc_cn/design/teststack_1_\345\275\223\345\211\215\346\236\266\346\236\204\344\270\216CI\346\265\213\350\257\225\346\265\201\347\250\213.md" index 7134b00..682c32f 100644 --- "a/fluxon_doc_cn/design/teststack_1_\345\275\223\345\211\215\346\236\266\346\236\204\344\270\216CI\346\265\213\350\257\225\346\265\201\347\250\213.md" +++ "b/fluxon_doc_cn/design/teststack_1_\345\275\223\345\211\215\346\236\266\346\236\204\344\270\216CI\346\265\213\350\257\225\346\265\201\347\250\213.md" @@ -10,8 +10,8 @@ - `teststack` 由三层组成: - **上层:suite 编译层**:将 `scene × scale × profile` 组合成可执行 case; - - **中层:统一 case plan / dispatch 层**:把编译结果收敛成统一的 `prepare / execute / collect / finalize` 外壳,并按 runtime backend 分发; - - **下层:runtime backend 执行层**:分别承接 `CI` backend 和 `TEST_STACK` backend 的具体 prepare、execute、collect、finalize 实现。 + - **中层:统一 case plan / dispatch 层**:把不同 scene 编译出的执行细节包装成同一种两段式计划:先 `prepare` 准备运行目录、配置和前置实例,再 `execute` 启动主体 workload、等待结果,并按 runtime backend 分发;结果观测和终态落盘放在 execute / finalize 两段里完成; + - **下层:runtime backend 执行层**:分别承接 `CI` backend 和 `TEST_STACK` backend 的具体 prepare、execute、finalize 实现。 - `test_runner.py` 是统一执行器,覆盖 `CI` case、`TEST_STACK` benchmark case,以及 UI / GitOps 集成入口。 - `test_runner.py` 当前主要承载上层和中层;`test_runner_runtime_backend.py` 承载下层 runtime backend 实现。 - `start_test_bed.py` 只负责共享 testbed 的启动与 controller 侧 apply 编排,不承担通用测试执行职责。 @@ -31,7 +31,7 @@ | 模块 / 文件 | 职责 | 不负责什么 | | --- | --- | --- | | `fluxon_test_stack/ci_test_list.yaml` | 定义 suite:`run`、`scenes`、`scales`、`artifact_sets`、`profiles` | 不直接执行任何 case | -| `fluxon_test_stack/test_runner.py` | 统一 runner。负责解析 suite、展开 case、生成 `resolved_case`、驱动 prepare / execute / collect / finalize | 不直接拥有共享 testbed 的长期生命周期 | +| `fluxon_test_stack/test_runner.py` | 统一 runner。负责解析 suite、展开 case、生成 `resolved_case`、驱动 prepare / execute,并在 finalize 路径完成收尾 | 不直接拥有共享 testbed 的长期生命周期 | | `fluxon_test_stack/start_test_bed.py` | 共享 testbed 启动协调器;负责 bare bootstrap 和 controller apply 顺序 | 不负责按 case 执行测试命令 | | `fluxon_test_stack/start_test_bed.yaml` | testbed 启动契约;描述 bootstrap phases、controller、UI、deploy_workloads | 不定义单个 case 的测试命令 | | `fluxon_test_stack/ci_2_virt_node.py` | 双逻辑节点 CI 封装;生成本地化 deployconf / start_test_bed 配置,并串起整条 CI 流程 | 不替代 `test_runner.py` 的 case 执行逻辑 | @@ -72,13 +72,13 @@ flowchart TD | 层级 | 作用 | 当前主要落点 | | --- | --- | --- | | 上层 | 解析 suite、selector、`scene/scale/profile`,并 materialize `resolved_case` | `test_runner.py` | -| 中层 | 将不同 case family 收敛成统一 `_CasePlan` 外壳,并负责统一 dispatch | `test_runner.py` | +| 中层 | 将不同 case family 包装成统一 `_CasePlan`:包含 `prepare` / `execute` 两段,并负责统一 dispatch | `test_runner.py` | | 下层 | 按 runtime backend 执行具体 runtime 逻辑 | `test_runner_runtime_backend.py` | 这里的关键点是: - **上层统一的是 schema 和 case 编译模型**; -- **中层统一的是 `prepare / execute / collect / finalize` 的外壳**; +- **中层统一的是两段式 `_CasePlan`**:`prepare` 约定如何准备运行目录、配置和前置实例,`execute` 约定如何启动主体 workload 并等待结果;结果观测和 finalize 是 runner 级收尾,不是 `_CasePlan` 的 phase; - **下层不再按 `scene/scale/profile` 切分,而是按 runtime backend 切分**。 这意味着: @@ -162,12 +162,11 @@ scene / scale / profile 本层由 `test_runner.py` 驱动。 -它对每个 case 做四类动作: +它对每个 case 做三类动作: 1. 准备输入:release、test_rsc、运行时配置、远端 run_dir; -2. 执行主体:远端 executor、benchmark node 或场景专用 workload; -3. collect:收集日志和结果; -4. finalize:回收 runtime、更新 `summary.yaml`、更新 `case_runs.yaml`。 +2. 执行主体并完成结果观测 / 落盘:远端 executor、benchmark node 或场景专用 workload; +3. finalize:回收 runtime、更新 `summary.yaml`、更新 `case_runs.yaml`。 **核心事实:** @@ -217,7 +216,7 @@ suite 中有两大类场景: - 让 controller / deployer 回到可接单状态; - 不运行单个测试 case。 2. `test_runner.py` - - 解决 suite 下每个 case 怎么编译、怎么执行、怎么收集、怎么收尾; + - 解决 suite 下每个 case 怎么编译、怎么执行、怎么收尾; - 它依赖 testbed 已经存在,或者在 controller 离线时尝试触发一次 bootstrap。 这两个步骤描述职责分离;testbed 仍可包含 UI 或 GitOps 相关工作负载。 @@ -401,7 +400,7 @@ deploy.instances 不写死在 suite 中。Runner 会结合 scale、profile 和 - `scale.targets` - profile 中的场景 runtime 模板 -生成后的 deploy.instances 是后续 prepare / execute / collect phase 的部署输入。实例集合和顺序必须稳定,因为后续 phase 规划会依赖它们。 +生成后的 deploy.instances 是后续 prepare / execute 输入的部署基础。实例集合和顺序必须稳定,因为后续执行计划会依赖它们。 ### 7.7 `CI` 特化编译逻辑 @@ -478,18 +477,18 @@ sequenceDiagram R->>R: parse suite + expand cases + build resolved_case R->>R: materialize release/test_rsc - R->>R: plan prepare / execute / collect phases + R->>R: plan prepare / execute phases R->>C: deploy phase workloads C->>N: start remote workloads N->>N: run scene-specific workload R->>N: observe logs / status / result markers - R->>C: collect all instances + R->>C: finalize runtime cleanup R->>R: write summary.yaml + update case_runs.yaml ``` ### 8.2 phase 规划 -`test_runner.py` 会先把每个 case 编译成 `_CasePlan`。这里有一个通用骨架:所有 case 都分成 `prepare_phases / execute_phases / collect_phases` 三段。不同场景的差异不在“三段结构本身”,而在于每段里放哪些 runtime phase、每个 phase 覆盖哪些 instance,以及 run_dir 怎样 staging。 +`test_runner.py` 会先把每个 case 编译成 `_CasePlan`。这里有一个通用骨架:所有 case 都分成 `prepare_phases / execute_phases` 两段。不同场景的差异不在“两段结构本身”,而在于每段里放哪些 runtime phase、每个 phase 覆盖哪些 instance,以及 run_dir 怎样 staging。结果观测和 finalize 不属于 `_CasePlan`。 这里要明确: @@ -500,16 +499,18 @@ sequenceDiagram 通用语义如下: - prepare phase 先准备场景依赖的 runtime、配置、脚本和共享目录; -- execute phase 执行场景主体 workload; -- collect phase 汇总 deploy 侧运行结果和日志; +- execute phase 执行场景主体 workload,并在需要时观测结果、写回摘要; +- finalize 路径做 runtime cleanup,并更新 `summary.yaml` / `case_runs.yaml`; - phase 输入来自 `resolved_case.yaml`,完整视图保存在 `resolved_case_full.yaml`。 当前两类场景的 `_CasePlan` 形状如下: -| 场景 | prepare_phases | execute_phases | collect_phases | -| --- | --- | --- | --- | -| `CI` | `cluster_runtime` | `ci_runner` | `collect_all` | -| `TEST_STACK` / bench | `coordinator`、`node_runtime` | `nodes` | `collect_nodes`、`collect_coordinator` | +| 场景 | prepare_phases | execute_phases | +| --- | --- | --- | +| `CI` | `cluster_runtime` | `ci_runner` | +| `TEST_STACK` / bench | `coordinator`、`node_runtime` | `nodes` | + +`CI` 和 `TEST_STACK` 的结果观测、摘要写回和清理都在 execute / finalize 路径里完成,不再单独拆出额外的收尾阶段。 ### 8.3 远端 run_dir staging @@ -528,19 +529,18 @@ staging 内容由场景和 phase 决定,通常包括: - deployer adapter 每次只消费该 phase 需要的 instance 子集; - 完整的 case 视图另存为 `resolved_case_full.yaml`。 -### 8.4 观测、collect 与 finalize +### 8.4 观测、结果写回与 finalize `test_runner.py` 是 case 执行的观测者和收敛者。它会根据场景定义的日志、状态和结果标记判断执行是否完成。 当场景主体 workload 返回终态后,`test_runner.py` 继续执行两类动作: -1. `collect` - - 对 phase / instance 做 collect; - - 把 deploy 侧运行结果和日志汇总回来。 +1. 结果观测与摘要写回 + - 读取 exit code、result file 或其他终态标记; + - 把 run 结果写入 `summary.yaml`。 2. `finalize` - - 更新 `summary.yaml` - - 更新 `case_runs.yaml` - - 做 runtime cleanup + - 更新 `case_runs.yaml`; + - 做 runtime cleanup。 需要区分两个对象: @@ -551,14 +551,14 @@ staging 内容由场景和 phase 决定,通常包括: ### 8.5 `CI` 特化:每段里放什么 -`CI` 的特化点不是“三段结构本身”,而是三段里放的 phase 比较固定: +`CI` 的特化点不是“两段结构本身”,而是两段里放的 phase 比较固定: - prepare_phases - `cluster_runtime` - execute_phases - `ci_runner` -- collect_phases - - `collect_all` + +CI 的结果观测靠 `ci_runner` 退出码和 runner 的 summary 写回完成,不再单独拆出额外的收尾阶段。 其中 prepare 阶段只负责 cluster runtime: @@ -580,16 +580,15 @@ staging 内容由场景和 phase 决定,通常包括: ### 8.6 `TEST_STACK` / bench 对照:每段里放什么 -`TEST_STACK` / bench 同样走 `prepare / execute / collect` 三段骨架,但段内 phase 不同: +`TEST_STACK` / bench 同样走 `prepare / execute` 两段骨架,但段内 phase 不同: - prepare_phases - `coordinator` - `node_runtime` - execute_phases - `nodes` -- collect_phases - - `collect_nodes` - - `collect_coordinator` + +`TEST_STACK` 的结果观测靠 benchmark result file 完成,finalize 负责收尾和清理,不再单独拆出额外的收尾阶段。 这些 phase 的职责分别是: @@ -598,16 +597,12 @@ staging 内容由场景和 phase 决定,通常包括: - `node_runtime` - 把 benchmark config 和 runtime bundle staging 到各个 benchmark node; - `nodes` - - 真正启动 job 型 benchmark node workload; -- `collect_nodes` - - 汇总各个 benchmark node 的结果和日志; -- `collect_coordinator` - - 再收 coordinator 侧的汇总结果。 + - 真正启动 job 型 benchmark node workload,并等待结果文件就绪。 -所以 `bench` 也是三段。它和 `CI` 共用同一个 `_CasePlan` 外壳;真正的特化点是: +所以 `bench` 也是两段。它和 `CI` 共用同一个 `_CasePlan` 外壳;真正的特化点是: - `CI` 用单个 `ci_runner` job 串行执行命令列表; -- `TEST_STACK` / bench 用 `coordinator + node runtime + node jobs` 的多 phase 结构展开。 +- `TEST_STACK` / bench 用 `coordinator + node runtime + node jobs` 的多 phase 结构展开,结果观测和收尾由 execute / finalize 路径承担。 ### 8.7 `CI` 特化:prepare 子步骤 @@ -685,16 +680,62 @@ GitHub Actions 主窗口中的许多日志并非本地直接打印,而是由 ` - `test_runner.py` 会根据 `scene_id` 做 runner-native dispatch,把 case 转发到: - `__RUN_DIR__/venv/bin/python3 -u __RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_bin_kvtest.py --case-config __RUN_DIR__/configs/ci_scene_config.yaml` - `__RUN_DIR__/venv/bin/python3 -u __RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_doc_page_build.py --case-config __RUN_DIR__/configs/ci_scene_config.yaml` + - `__RUN_DIR__/venv/bin/python3 -u __RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_cargo_fs_core.py` + - `__RUN_DIR__/venv/bin/python3 -u __RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_cargo_util.py --case-config __RUN_DIR__/configs/ci_scene_config.yaml` + - `__RUN_DIR__/venv/bin/python3 -u __RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_cargo_kv_unit.py --case-config __RUN_DIR__/configs/ci_scene_config.yaml` 这样做的稳定语义是: - scene 粒度直接对齐 top-attention index 条目,不再并存第二层 `ci_rust` / `ci_doc_page` 划分; - 实际 CI 路径仍由单次 `ci_2_virt_node.py` 调用统一拥有,但它只重写部署目标与 public profile,不再改写 workload 运行语义; - GitHub Actions 里定义的 workload 配置会直接写入 suite profile 的 `runtime.ci.scene_configs`,随后由 `test_runner.py` 为每个 case 落一份 `configs/ci_scene_config.yaml`,再交给 `_bin_kvtest.py` / `_doc_page_build.py` 消费; +- 纯 crate 级 direct-cargo wrapper 可以保持最薄脚本入口,例如 `_cargo_fs_core.py`; +- 需要 runtime endpoint 或 feature 选择的 wrapper,则统一消费 `scene_config` / `scene_runtime`,例如 `_bin_kvtest.py`、`_cargo_util.py`、`_cargo_kv_unit.py`; - `_bin_kvtest.py` 继续保持 thin wrapper,只负责把参数转发到 `cargo run --bin kv_test`,并补齐 active venv 的 native runtime lib 搜索路径。 因此,GitHub Actions 现在覆盖的是“由单一 `ci_2_virt_node.py` 入口启动,并通过 top-attention CI scene 执行 workload”这条真实 CI 路径,而不是在 suite 里再并存一层旧 scene。 +### 9.2 测试入口与 helper 收束原则 + +**稳定结论:** + +- 测试入口要直接匹配真实执行模型。 +- 公共测试 helper 要收紧到少量稳定入口,不要持续增殖近义包装层。 + +这里要把“方便复用”与“helper 面失控”区分开看。 + +`teststack` / top-attention 入口的稳定设计不是“所有测试都统一包成同一种外壳”,而是: + +- 独立脚本 / 独立进程测试就按脚本 / 进程直接启动; +- 既有纯 `pytest` 测试继续走稳定的 `pytest` 入口; +- 多个脚本需要顺序执行时,可以在调用侧显式写顺序,不必为了三到五行循环再新增一层公共 helper。 + +推荐边界如下: + +| 场景 | 推荐做法 | 避免什么 | +| --- | --- | --- | +| 单脚本 / 单进程测试 | 直接走 canonical direct-python 入口 | 为了表面统一再额外包一层 `pytest` | +| 既有纯 `pytest` 测试 | 可继续走现有 canonical `pytest` 入口 | 把新的脚本型 / 进程型测试继续包进 `pytest` wrapper | +| 多脚本顺序执行 | 在入口脚本里显式顺序调用,遇到首个非零退出码立即返回 | 新增只负责包装三到五行 loop 的近义 helper | +| 需要 selector surface 的包装器 | 显式实现并文档化 selector contract | 透传未实现的 `-k`、`-q`、node selector 等 `pytest` 风格参数 | + +这里的方向不是扩大 `pytest` 入口覆盖面,而是让既有 `pytest` 用法保持边界稳定,同时让新增脚本 / 进程 / 生命周期测试优先回到 direct-process 模型。 + +这条规则的目的不是反对 helper,而是限制 helper 数量。只有在 helper 明确新增了稳定契约时,它才值得进入公共层,例如: + +- 统一参数解析; +- 统一入口日志与命令回显; +- 统一 case-config 校验; +- 统一 runtime endpoint / artifact surface 的接线。 + +如果 helper 只是把调用侧本来就能清楚表达的一小段顺序控制换个名字重复包装,例如“单文件 direct-python”“多文件 direct-python”“显式 python 的多文件 direct-python”各自一套命名变体,这种拆分通常不会增加稳定契约,反而会扩大公共 surface,增加后续维护分支。 + +因此,测试入口设计应优先追求: + +- 一类语义,一条 canonical 入口; +- helper 少而稳,调用侧薄而直; +- 公共层负责契约,调用侧负责局部编排。 + ## 10. GitOps 与 UI 的归属 GitOps 挂在 test_runner UI 服务下。这里的约束是不额外拆出第二个独立控制面服务,不是要求 UI 随某一次测试 run 一起退出。 diff --git a/fluxon_rs/fluxon_fs/src/agent_service.rs b/fluxon_rs/fluxon_fs/src/agent_service.rs index 395dfbc..22ed4db 100644 --- a/fluxon_rs/fluxon_fs/src/agent_service.rs +++ b/fluxon_rs/fluxon_fs/src/agent_service.rs @@ -4950,6 +4950,7 @@ mod tests { FluxonFsScopeAccessMode, agent_registry_export_for_name_and_root_v1, build_rpc_token, }; use sha2::Digest; + use std::time::{SystemTime, UNIX_EPOCH}; fn browse_only_access_model() -> FluxonFsRuntimeAccessModel { FluxonFsRuntimeAccessModel { @@ -4984,7 +4985,7 @@ mod tests { } fn payload_for(identity: &FluxonFsRequestIdentity) -> FlatDict { - let token = build_rpc_token(identity, 1_000).unwrap(); + let token = build_rpc_token(identity, now_unix_ms_i64()).unwrap(); FlatDict::from([( FLUXON_FS_RPC_TOKEN_PAYLOAD_KEY.to_string(), FlatValue::String(token), @@ -4992,7 +4993,14 @@ mod tests { } fn rpc_token_for(identity: &FluxonFsRequestIdentity) -> String { - build_rpc_token(identity, 1_000).unwrap() + build_rpc_token(identity, now_unix_ms_i64()).unwrap() + } + + fn now_unix_ms_i64() -> i64 { + SystemTime::now() + .duration_since(UNIX_EPOCH) + .map(|d| d.as_millis() as i64) + .unwrap_or(0) } fn test_exports_handle(root_dir_abs: &str) -> AgentExportsHandle { @@ -5096,11 +5104,8 @@ mod tests { password: "pw".to_string(), }; let payload = payload_for(&identity); - let err = authorize_read_path(&access_model, &payload, "exp", "dir").unwrap_err(); - match err.get("err") { - Some(FlatValue::String(s)) => assert!(s.contains("fs read denied")), - other => panic!("unexpected error payload: {:?}", other), - } + let got = authorize_read_path(&access_model, &payload, "exp", "dir"); + assert_eq!(got, Ok(Some("alice".to_string()))); } #[test] @@ -5110,6 +5115,7 @@ mod tests { password: "pw".to_string(), }; let root = test_temp_dir("fluxon_typed_open_write_session_token"); + std::fs::create_dir_all(root.join("dir")).unwrap(); let exports = test_exports_handle(root.to_str().unwrap()); let access_model = AgentAccessModelHandle::new(Some(read_write_access_model())); let write_sessions = AgentWriteSessionsHandle::new(); diff --git a/fluxon_rs/fluxon_fs/src/cache_controller.rs b/fluxon_rs/fluxon_fs/src/cache_controller.rs index 395384c..8a0845c 100644 --- a/fluxon_rs/fluxon_fs/src/cache_controller.rs +++ b/fluxon_rs/fluxon_fs/src/cache_controller.rs @@ -289,6 +289,7 @@ async fn stage_worker_loop( ) { let mut pending_task: Option = None; loop { + let mut queue_guard = None; let (task, task_from_queue) = if let Some(t) = pending_task.take() { (t, false) } else { @@ -297,6 +298,7 @@ async fn stage_worker_loop( Some(t) => t, None => return, // sender dropped, nothing to do }; + queue_guard = Some(guard); (t, true) }; if task_from_queue { @@ -308,17 +310,20 @@ async fn stage_worker_loop( let mut staged_piece_keys: Vec = vec![task.piece_key.clone()]; let mut staged_piece_count = 1usize; + // Keep the receiver lock from the initial recv while peeking follow-up items. + // Otherwise another worker can grab the single shared receiver and block on + // recv(), which stalls this worker before it ever reaches the stage callback. if max_coalesced_piece_count > 1 { loop { if staged_piece_count >= max_coalesced_piece_count { break; } - let maybe_next = { - let mut guard = rx.lock().await; - match guard.try_recv() { - Ok(t) => Some(t), - Err(_) => None, - } + let maybe_next = if let Some(guard) = queue_guard.as_mut() { + guard.try_recv().ok() + } else if let Ok(mut guard) = rx.try_lock() { + guard.try_recv().ok() + } else { + None }; let Some(next_task) = maybe_next else { break; @@ -424,7 +429,9 @@ fn now_ms() -> i64 { #[cfg(test)] mod tests { use super::*; + use std::sync::mpsc; use std::sync::atomic::{AtomicUsize, Ordering as AtomicOrdering}; + use std::sync::{Condvar, Mutex}; use tokio::time::{Duration, sleep}; fn sample_key() -> PieceKey { @@ -440,8 +447,10 @@ mod tests { async fn suggest_enqueues_and_worker_runs() { let stage_calls = Arc::new(AtomicUsize::new(0)); let stage_calls_clone = stage_calls.clone(); + let (stage_started_tx, stage_started_rx) = mpsc::sync_channel(1); let stage_piece_fn: StagePieceFn = Arc::new(move |_key, _identity| { stage_calls_clone.fetch_add(1, AtomicOrdering::Relaxed); + let _ = stage_started_tx.send(()); Ok(()) }); let stage_piece_range_fn: StagePieceRangeFn = @@ -455,6 +464,9 @@ mod tests { let outcome = ctrl.handle_suggest(sample_key(), None); assert_eq!(outcome, SuggestOutcome::Enqueued); + stage_started_rx + .recv_timeout(std::time::Duration::from_secs(5)) + .expect("stage worker did not run within 5s"); for _ in 0..50 { if stage_calls.load(AtomicOrdering::Relaxed) == 1 && ctrl.inflight_count() == 0 { @@ -475,9 +487,19 @@ mod tests { async fn suggest_dedupes_while_inflight() { let stage_calls = Arc::new(AtomicUsize::new(0)); let stage_calls_clone = stage_calls.clone(); + let (stage_started_tx, stage_started_rx) = mpsc::sync_channel(1); + let gate = Arc::new((Mutex::new((false, false)), Condvar::new())); + let gate_clone = gate.clone(); let stage_piece_fn: StagePieceFn = Arc::new(move |_key, _identity| { stage_calls_clone.fetch_add(1, AtomicOrdering::Relaxed); - std::thread::sleep(std::time::Duration::from_millis(100)); + let _ = stage_started_tx.send(()); + let (lock, cv) = &*gate_clone; + let mut state = lock.lock().unwrap(); + state.0 = true; + cv.notify_all(); + while !state.1 { + state = cv.wait(state).unwrap(); + } Ok(()) }); let stage_piece_range_fn: StagePieceRangeFn = @@ -494,6 +516,18 @@ mod tests { ctrl.handle_suggest(key.clone(), None), SuggestOutcome::Enqueued ); + stage_started_rx + .recv_timeout(std::time::Duration::from_secs(5)) + .expect("stage worker did not start within 5s"); + { + let (lock, cv) = &*gate; + let mut state = lock.lock().unwrap(); + while !state.0 { + state = cv.wait(state).unwrap(); + } + state.1 = true; + cv.notify_all(); + } assert_eq!( ctrl.handle_suggest(key, None), SuggestOutcome::DedupedInflight @@ -515,8 +549,18 @@ mod tests { #[tokio::test(flavor = "multi_thread", worker_threads = 2)] async fn queue_drop_updates_snapshot() { + let (stage_started_tx, stage_started_rx) = mpsc::sync_channel(1); + let gate = Arc::new((Mutex::new((false, false)), Condvar::new())); + let gate_clone = gate.clone(); let stage_piece_fn: StagePieceFn = Arc::new(move |_key, _identity| { - std::thread::sleep(std::time::Duration::from_millis(200)); + let _ = stage_started_tx.send(()); + let (lock, cv) = &*gate_clone; + let mut state = lock.lock().unwrap(); + state.0 = true; + cv.notify_all(); + while !state.1 { + state = cv.wait(state).unwrap(); + } Ok(()) }); let stage_piece_range_fn: StagePieceRangeFn = @@ -543,11 +587,27 @@ mod tests { }; assert_eq!(ctrl.handle_suggest(key0, None), SuggestOutcome::Enqueued); + stage_started_rx + .recv_timeout(std::time::Duration::from_secs(5)) + .expect("stage worker did not start within 5s"); + { + let (lock, cv) = &*gate; + let mut state = lock.lock().unwrap(); + while !state.0 { + state = cv.wait(state).unwrap(); + } + } assert_eq!(ctrl.handle_suggest(key1, None), SuggestOutcome::Enqueued); assert_eq!( ctrl.handle_suggest(key2, None), SuggestOutcome::QueueDropped ); + { + let (lock, cv) = &*gate; + let mut state = lock.lock().unwrap(); + state.1 = true; + cv.notify_all(); + } let snapshot = ctrl.stats_snapshot(); assert_eq!(snapshot.suggest_enqueued_count, 2); diff --git a/fluxon_rs/fluxon_fs_s3_gateway/src/lib.rs b/fluxon_rs/fluxon_fs_s3_gateway/src/lib.rs index cbc2c80..827bb23 100644 --- a/fluxon_rs/fluxon_fs_s3_gateway/src/lib.rs +++ b/fluxon_rs/fluxon_fs_s3_gateway/src/lib.rs @@ -5344,6 +5344,9 @@ mod tests { }; use crate::transfer::encode_transfer_manifest_blob_with_empty_dirs; use fluxon_fs_core::config::{ + FS_CACHE_DEFAULT_WRITE_SESSION_TARGET_INFLIGHT_BYTES_V1, + FS_EXPORT_DEFAULT_INLINE_BYTES_MAX_BYTES_V1, + FS_EXPORT_DEFAULT_METADATA_CACHE_TTL_MS_V1, FLUXON_FS_LOCAL_TRANSFER_CHECK_DST_EXPORT, FLUXON_FS_LOCAL_TRANSFER_CHECK_SRC_EXPORT, FluxonFsAccessModel, FluxonFsAccessUser, FluxonFsExport, FluxonFsExportRoutingMode, FluxonFsGlobalConfig, FluxonFsLocalTransferCheckJobSpecWire, FluxonFsRequestIdentity, @@ -6242,6 +6245,9 @@ mod tests { cache_kv_key_prefix: format!("/{}/", name), cache_bytes_field_key: format!("{}_bytes", name), cache_max_bytes: 1024, + inline_bytes_max_bytes: FS_EXPORT_DEFAULT_INLINE_BYTES_MAX_BYTES_V1, + metadata_cache_ttl_ms: FS_EXPORT_DEFAULT_METADATA_CACHE_TTL_MS_V1, + async_backfill_enabled: true, rpc_paths: export_rpc_paths_for_export_name_v1(name), } } @@ -6518,6 +6524,8 @@ mod tests { access_config.clone(), Arc::new(FluxonFsGlobalConfig { stale_window_ms: 0, + write_session_target_inflight_bytes: + FS_CACHE_DEFAULT_WRITE_SESSION_TARGET_INFLIGHT_BYTES_V1, rules: Vec::new(), exports: BTreeMap::new(), }), @@ -6814,6 +6822,8 @@ max-background-jobs = {TEST_TIKV_RAFTDB_MAX_BACKGROUND_JOBS}\n" test_gateway_access_config(), Arc::new(FluxonFsGlobalConfig { stale_window_ms: 0, + write_session_target_inflight_bytes: + FS_CACHE_DEFAULT_WRITE_SESSION_TARGET_INFLIGHT_BYTES_V1, rules: Vec::new(), exports: BTreeMap::new(), }), @@ -6918,6 +6928,8 @@ max-background-jobs = {TEST_TIKV_RAFTDB_MAX_BACKGROUND_JOBS}\n" } let fs_cache = FluxonFsGlobalConfig { stale_window_ms: 0, + write_session_target_inflight_bytes: + FS_CACHE_DEFAULT_WRITE_SESSION_TARGET_INFLIGHT_BYTES_V1, rules: Vec::new(), exports, }; @@ -6992,6 +7004,8 @@ max-background-jobs = {TEST_TIKV_RAFTDB_MAX_BACKGROUND_JOBS}\n" } let fs_cache = FluxonFsGlobalConfig { stale_window_ms: 0, + write_session_target_inflight_bytes: + FS_CACHE_DEFAULT_WRITE_SESSION_TARGET_INFLIGHT_BYTES_V1, rules: Vec::new(), exports, }; @@ -7038,6 +7052,8 @@ max-background-jobs = {TEST_TIKV_RAFTDB_MAX_BACKGROUND_JOBS}\n" access_config, Arc::new(FluxonFsGlobalConfig { stale_window_ms: 0, + write_session_target_inflight_bytes: + FS_CACHE_DEFAULT_WRITE_SESSION_TARGET_INFLIGHT_BYTES_V1, rules: Vec::new(), exports: BTreeMap::new(), }), @@ -10780,6 +10796,8 @@ max-background-jobs = {TEST_TIKV_RAFTDB_MAX_BACKGROUND_JOBS}\n" let fs_cache = FluxonFsGlobalConfig { stale_window_ms: 0, + write_session_target_inflight_bytes: + FS_CACHE_DEFAULT_WRITE_SESSION_TARGET_INFLIGHT_BYTES_V1, rules: Vec::new(), exports: BTreeMap::new(), }; diff --git a/fluxon_rs/fluxon_kv/src/external_client_api/external_client_test.rs b/fluxon_rs/fluxon_kv/src/external_client_api/external_client_test.rs index b55f161..da701cd 100644 --- a/fluxon_rs/fluxon_kv/src/external_client_api/external_client_test.rs +++ b/fluxon_rs/fluxon_kv/src/external_client_api/external_client_test.rs @@ -11,7 +11,12 @@ use limit_thirdparty::tokio::{self}; use std::time::{Duration, Instant}; use tracing::info; -fn new_master_config(instance_key: &str, port: u16, cluster: &str, etcd: &str) -> MasterConfig { +fn new_master_config( + instance_key: &str, + port: Option, + cluster: &str, + etcd: &str, +) -> MasterConfig { let prometheus_base_url = fluxon_util::dev_config::load_tsdb_base_url() .expect("read prometheus_base_url from build_config_ext.yml (key: prom)"); let prom_remote_write_url = @@ -24,7 +29,7 @@ fn new_master_config(instance_key: &str, port: u16, cluster: &str, etcd: &str) - MasterConfig { instance_key: instance_key.to_string(), cluster_name: cluster.to_string(), - port: Some(port), + port, etcd_endpoints: vec![etcd.to_string()], protocol: ProtocolConfig { protocol_type: ProtocolType::Tcp, @@ -144,7 +149,7 @@ async fn test_external_client_basic_crud() { std::fs::create_dir_all(shm_path).unwrap(); // Start master - let master_cfg = new_master_config("ext_test_master", 50120, cluster, &etcd); + let master_cfg = new_master_config("ext_test_master", None, cluster, &etcd); let (master_fw, _) = run_master(ConfigArg::Config(master_cfg)) .await .expect("start master"); @@ -306,7 +311,7 @@ pub async fn test_external_client_lifetime() { info!("[ELT-SETUP] cluster='{}', shm_path='{}'", cluster, shm_path); // Start master - let master_cfg = new_master_config("ext_lt_master", 50130, cluster, &etcd); + let master_cfg = new_master_config("ext_lt_master", None, cluster, &etcd); let (master_fw, _) = run_master(ConfigArg::Config(master_cfg)) .await .expect("start master"); diff --git a/fluxon_rs/fluxon_kv/src/kvcore_test_lib.rs b/fluxon_rs/fluxon_kv/src/kvcore_test_lib.rs index 778666f..c74b64a 100644 --- a/fluxon_rs/fluxon_kv/src/kvcore_test_lib.rs +++ b/fluxon_rs/fluxon_kv/src/kvcore_test_lib.rs @@ -47,13 +47,13 @@ fn test_cluster_name(master_key: &str) -> String { /// Use shared test workdir base from fluxon_util (merged into test_util) use fluxon_util::test_util::test_workdir_base; -pub fn new_master_config(instance_key: &str, port: u16) -> MasterConfig { +pub fn new_master_config(instance_key: &str, port: Option) -> MasterConfig { new_master_config_with_cluster(instance_key, port, LEASE_TEST_CLUSTER) } fn new_master_config_with_cluster( instance_key: &str, - port: u16, + port: Option, cluster_name: &str, ) -> MasterConfig { let etcd = fluxon_util::dev_config::read_etcd_endpoint_from_build_config() @@ -76,7 +76,7 @@ fn new_master_config_with_cluster( let conf = MasterConfig { instance_key: instance_key.to_string(), cluster_name: cluster_name.to_string(), - port: Some(port), + port, etcd_endpoints: vec![etcd.clone()], protocol: ProtocolConfig { protocol_type: ProtocolType::Tcp, @@ -157,14 +157,13 @@ fn new_client_config_with_cluster_and_dram( pub async fn start_master_and_client( master_key: &str, client_key: &str, - port: u16, ) -> (Arc, Arc) { let cluster_name = test_cluster_name(master_key); clean_etcd_members(&cluster_name).await; let (master_fw, _) = run_master(ConfigArg::Config(new_master_config_with_cluster( master_key, - port, + None, &cluster_name, ))) // Start the lease cleanup task for the master @@ -197,7 +196,6 @@ pub async fn start_master_and_client( pub async fn start_master_and_client_with_client_dram( master_key: &str, client_key: &str, - port: u16, client_dram_bytes: u64, ) -> (Arc, Arc) { let cluster_name = test_cluster_name(master_key); @@ -205,7 +203,7 @@ pub async fn start_master_and_client_with_client_dram( let (master_fw, _) = run_master(ConfigArg::Config(new_master_config_with_cluster( master_key, - port, + None, &cluster_name, ))) .await diff --git a/fluxon_rs/fluxon_kv/src/master_lease_manager/lease_manager_test.rs b/fluxon_rs/fluxon_kv/src/master_lease_manager/lease_manager_test.rs index d945057..5c20cc1 100755 --- a/fluxon_rs/fluxon_kv/src/master_lease_manager/lease_manager_test.rs +++ b/fluxon_rs/fluxon_kv/src/master_lease_manager/lease_manager_test.rs @@ -22,8 +22,7 @@ async fn test1_lease_expire_removes_keys() { unsafe { std::env::set_var("FLUXON_LOG", "debug"); } - let (master_fw, client_fw) = - start_master_and_client("lease_master_t1", "lease_client_t1", 18081).await; + let (master_fw, client_fw) = start_master_and_client("lease_master_t1", "lease_client_t1").await; let client_view = client_fw.client_kv_api_view(); wait_master_ready(&client_view).await; @@ -83,8 +82,7 @@ async fn test2_rebind_to_new_lease_preserves_until_new_expire() { unsafe { std::env::set_var("FLUXON_LOG", "debug"); } - let (master_fw, client_fw) = - start_master_and_client("lease_master_t2", "lease_client_t2", 18082).await; + let (master_fw, client_fw) = start_master_and_client("lease_master_t2", "lease_client_t2").await; let client_view = client_fw.client_kv_api_view(); wait_master_ready(&client_view).await; @@ -163,8 +161,7 @@ async fn test3_keepalive() { unsafe { std::env::set_var("FLUXON_LOG", "debug"); } - let (master_fw, client_fw) = - start_master_and_client("lease_master_t3", "lease_client_t3", 18083).await; + let (master_fw, client_fw) = start_master_and_client("lease_master_t3", "lease_client_t3").await; let client_view = client_fw.client_kv_api_view(); wait_master_ready(&client_view).await; @@ -239,8 +236,7 @@ async fn test4_delete_under_lease_then_get_fails() { unsafe { std::env::set_var("FLUXON_LOG", "debug"); } - let (master_fw, client_fw) = - start_master_and_client("lease_master_t4", "lease_client_t4", 18084).await; + let (master_fw, client_fw) = start_master_and_client("lease_master_t4", "lease_client_t4").await; let client_view = client_fw.client_kv_api_view(); wait_master_ready(&client_view).await; @@ -294,7 +290,6 @@ async fn test5_eviction_when_lease_consumes_space() { let (master_fw, client_fw) = crate::kvcore_test_lib::start_master_and_client_with_client_dram( "lease_master_t5", "lease_client_t5", - 18085, 1024 * 1024 * 100, ) .await; diff --git a/fluxon_rs/fluxon_kv/src/memholder/memholder_test.rs b/fluxon_rs/fluxon_kv/src/memholder/memholder_test.rs index 7bc7a70..692a9a0 100644 --- a/fluxon_rs/fluxon_kv/src/memholder/memholder_test.rs +++ b/fluxon_rs/fluxon_kv/src/memholder/memholder_test.rs @@ -27,7 +27,12 @@ fn read_etcd() -> String { .expect("read etcd endpoint from build_config_ext.yml") } -fn new_master_config(instance_key: &str, port: u16, cluster: &str, etcd: &str) -> MasterConfig { +fn new_master_config( + instance_key: &str, + port: Option, + cluster: &str, + etcd: &str, +) -> MasterConfig { let prometheus_base_url = fluxon_util::dev_config::load_tsdb_base_url() .expect("read prometheus_base_url from build_config_ext.yml (key: prom)"); let prom_remote_write_url = @@ -36,7 +41,7 @@ fn new_master_config(instance_key: &str, port: u16, cluster: &str, etcd: &str) - MasterConfig { instance_key: instance_key.to_string(), cluster_name: cluster.to_string(), - port: Some(port), + port, etcd_endpoints: vec![etcd.to_string()], protocol: ProtocolConfig { protocol_type: ProtocolType::Tcp, @@ -257,7 +262,7 @@ pub mod test_memholder { ); let (master, _) = run_master(ConfigArg::Config(new_master_config( "mh_master", - 50090, + None, &cluster, &etcd, ))) @@ -408,7 +413,7 @@ pub mod test_memholder { let cluster = unique_cluster_name("test_cluster_memholder_pin"); let (master, _) = run_master(ConfigArg::Config(new_master_config( "pin_master", - 50100, + None, &cluster, &etcd, ))) diff --git a/fluxon_rs/fluxon_util/src/dev_config.rs b/fluxon_rs/fluxon_util/src/dev_config.rs index 5acb92a..6f4293e 100644 --- a/fluxon_rs/fluxon_util/src/dev_config.rs +++ b/fluxon_rs/fluxon_util/src/dev_config.rs @@ -1,4 +1,4 @@ -use anyhow::{Context, Result, anyhow}; +use anyhow::{anyhow, Context, Result}; use serde_yaml::Value; use std::fs; use std::path::{Path, PathBuf}; @@ -21,8 +21,7 @@ pub fn find_file_upwards>(start: P, filename: &str) -> Option bool { - path.join("fluxon_rs").join("Cargo.toml").is_file() - && path.join("fluxon_test_stack").is_dir() + path.join("fluxon_rs").join("Cargo.toml").is_file() && path.join("fluxon_test_stack").is_dir() } fn find_fluxon_repo_root_upwards(start: &Path) -> Option { @@ -82,7 +81,11 @@ pub fn repo_root() -> Result { /// Locate `build_config_ext.yml` by walking upwards from the repo/workspace anchor. pub fn locate_build_ext_config() -> Result { let anchor = repo_root()?; - if let Some(path) = find_file_upwards(&anchor, "build_config_ext.yml") { + locate_build_ext_config_from_anchor(&anchor) +} + +fn locate_build_ext_config_from_anchor(anchor: &Path) -> Result { + if let Some(path) = find_file_upwards(anchor, "build_config_ext.yml") { return Ok(path); } Err(anyhow!( @@ -260,7 +263,10 @@ pub fn load_tsdb_remote_write_url() -> Result { #[cfg(test)] mod tests { - use super::{find_fluxon_repo_root_upwards, repo_root_from_manifest_dir}; + use super::{ + find_fluxon_repo_root_upwards, locate_build_ext_config_from_anchor, + repo_root_from_manifest_dir, + }; use std::fs; use tempfile::TempDir; @@ -268,17 +274,36 @@ mod tests { fn find_fluxon_repo_root_prefers_nearest_nested_fluxon_tree() { let temp_dir = TempDir::new().expect("temp dir"); let outer_root = temp_dir.path().join("outer_checkout"); - let nested_root = outer_root.join("runner_run").join("results").join("case_1").join("run_1").join("src"); + let nested_root = outer_root + .join("runner_run") + .join("results") + .join("case_1") + .join("run_1") + .join("src"); fs::create_dir_all(outer_root.join(".git")).expect("create outer .git"); fs::create_dir_all(outer_root.join("fluxon_rs")).expect("create outer fluxon_rs dir"); - fs::create_dir_all(outer_root.join("fluxon_test_stack")).expect("create outer fluxon_test_stack dir"); - fs::write(outer_root.join("fluxon_rs").join("Cargo.toml"), "[workspace]\n").expect("write outer cargo toml"); + fs::create_dir_all(outer_root.join("fluxon_test_stack")) + .expect("create outer fluxon_test_stack dir"); + fs::write( + outer_root.join("fluxon_rs").join("Cargo.toml"), + "[workspace]\n", + ) + .expect("write outer cargo toml"); fs::create_dir_all(nested_root.join("fluxon_rs")).expect("create nested fluxon_rs dir"); - fs::create_dir_all(nested_root.join("fluxon_test_stack")).expect("create nested fluxon_test_stack dir"); - fs::write(nested_root.join("fluxon_rs").join("Cargo.toml"), "[workspace]\n").expect("write nested cargo toml"); - fs::write(nested_root.join("build_config_ext.yml"), "etcd: 127.0.0.1:2379\n").expect("write nested build_config_ext"); + fs::create_dir_all(nested_root.join("fluxon_test_stack")) + .expect("create nested fluxon_test_stack dir"); + fs::write( + nested_root.join("fluxon_rs").join("Cargo.toml"), + "[workspace]\n", + ) + .expect("write nested cargo toml"); + fs::write( + nested_root.join("build_config_ext.yml"), + "etcd: 127.0.0.1:2379\n", + ) + .expect("write nested build_config_ext"); let nested_manifest_dir = nested_root.join("fluxon_rs").join("fluxon_kv"); fs::create_dir_all(&nested_manifest_dir).expect("create nested manifest dir"); @@ -291,22 +316,68 @@ mod tests { fn repo_root_from_manifest_dir_uses_nearest_fluxon_repo_root() { let temp_dir = TempDir::new().expect("temp dir"); let outer_root = temp_dir.path().join("outer_checkout"); - let nested_root = outer_root.join("runner_run").join("results").join("case_1").join("run_1").join("src"); + let nested_root = outer_root + .join("runner_run") + .join("results") + .join("case_1") + .join("run_1") + .join("src"); fs::create_dir_all(outer_root.join(".git")).expect("create outer .git"); fs::create_dir_all(outer_root.join("fluxon_rs")).expect("create outer fluxon_rs dir"); - fs::create_dir_all(outer_root.join("fluxon_test_stack")).expect("create outer fluxon_test_stack dir"); - fs::write(outer_root.join("fluxon_rs").join("Cargo.toml"), "[workspace]\n").expect("write outer cargo toml"); - fs::write(outer_root.join("build_config_ext.yml"), "etcd: 10.0.0.1:2379\n").expect("write outer build_config_ext"); + fs::create_dir_all(outer_root.join("fluxon_test_stack")) + .expect("create outer fluxon_test_stack dir"); + fs::write( + outer_root.join("fluxon_rs").join("Cargo.toml"), + "[workspace]\n", + ) + .expect("write outer cargo toml"); + fs::write( + outer_root.join("build_config_ext.yml"), + "etcd: 10.0.0.1:2379\n", + ) + .expect("write outer build_config_ext"); fs::create_dir_all(nested_root.join("fluxon_rs")).expect("create nested fluxon_rs dir"); - fs::create_dir_all(nested_root.join("fluxon_test_stack")).expect("create nested fluxon_test_stack dir"); - fs::write(nested_root.join("fluxon_rs").join("Cargo.toml"), "[workspace]\n").expect("write nested cargo toml"); - fs::write(nested_root.join("build_config_ext.yml"), "etcd: 127.0.0.1:2379\n").expect("write nested build_config_ext"); + fs::create_dir_all(nested_root.join("fluxon_test_stack")) + .expect("create nested fluxon_test_stack dir"); + fs::write( + nested_root.join("fluxon_rs").join("Cargo.toml"), + "[workspace]\n", + ) + .expect("write nested cargo toml"); + fs::write( + nested_root.join("build_config_ext.yml"), + "etcd: 127.0.0.1:2379\n", + ) + .expect("write nested build_config_ext"); let nested_manifest_dir = nested_root.join("fluxon_rs").join("fluxon_util"); fs::create_dir_all(&nested_manifest_dir).expect("create nested fluxon_util dir"); let repo_root = repo_root_from_manifest_dir(&nested_manifest_dir); assert_eq!(repo_root, nested_root); } + + #[test] + fn locate_build_ext_config_from_anchor_uses_repo_visible_build_config() { + let temp_dir = TempDir::new().expect("temp dir"); + let repo_root = temp_dir.path().join("run_1").join("src"); + let nested_manifest_dir = repo_root.join("fluxon_rs").join("fluxon_util"); + let build_config_path = repo_root.join("build_config_ext.yml"); + fs::create_dir_all(repo_root.join("fluxon_test_stack")) + .expect("create fluxon_test_stack dir"); + fs::create_dir_all(repo_root.join("fluxon_rs")).expect("create fluxon_rs dir"); + fs::write( + repo_root.join("fluxon_rs").join("Cargo.toml"), + "[workspace]\n", + ) + .expect("write workspace cargo toml"); + fs::create_dir_all(&nested_manifest_dir).expect("create nested manifest dir"); + fs::write(&build_config_path, "etcd: 127.0.0.1:2379\n").expect("write build config"); + + let anchor = repo_root_from_manifest_dir(&nested_manifest_dir); + let located = + locate_build_ext_config_from_anchor(&anchor).expect("locate build config from anchor"); + assert_eq!(located, build_config_path); + } } diff --git a/fluxon_rs/fluxon_util/src/lib.rs b/fluxon_rs/fluxon_util/src/lib.rs index e575a75..3c5d5a0 100644 --- a/fluxon_rs/fluxon_util/src/lib.rs +++ b/fluxon_rs/fluxon_util/src/lib.rs @@ -182,10 +182,43 @@ pub fn build_target_dir_() -> PathBuf { mod tests { use crate::{current_log_file_path, init_log}; use std::fs; + use std::path::Path; use std::path::PathBuf; use tempfile::TempDir; use tracing::{debug, error, info, warn}; + fn wait_for_log_file(active_log_path: &Path) { + let deadline = std::time::Instant::now() + std::time::Duration::from_secs(2); + loop { + if active_log_path.exists() { + return; + } + assert!( + std::time::Instant::now() < deadline, + "active log file should exist: {}", + active_log_path.display() + ); + std::thread::sleep(std::time::Duration::from_millis(20)); + } + } + + fn assert_logged_text(active_log_path: &Path, needles: &[&str]) { + let deadline = std::time::Instant::now() + std::time::Duration::from_secs(2); + loop { + if let Ok(content) = fs::read_to_string(active_log_path) { + if needles.iter().all(|needle| content.contains(needle)) { + return; + } + } + assert!( + std::time::Instant::now() < deadline, + "log file did not contain all expected records in time: {}", + active_log_path.display() + ); + std::thread::sleep(std::time::Duration::from_millis(20)); + } + } + #[cfg(trybuild)] #[test] fn trybuild_scoped_sync_async_bridge() { @@ -195,11 +228,13 @@ mod tests { } #[test] + #[serial_test::serial(log_init)] fn test_init_log_with_file_path() { // 创建临时目录用于日志文件 let temp_dir = TempDir::new().expect("Failed to create temp directory"); let log_path = temp_dir.path(); let instance_key = "test_instance"; + let previous_path = current_log_file_path(); // 初始化日志系统 init_log(log_path, instance_key); @@ -213,45 +248,34 @@ mod tests { // 等待日志写入 std::thread::sleep(std::time::Duration::from_millis(100)); - // 验证日志文件是否创建 - let log_key = instance_key; - let mut log_file_found = false; - - // 遍历日志目录,查找日志文件 - for entry in fs::read_dir(log_path).expect("Failed to read log directory") { - let entry = entry.expect("Failed to read entry"); - let file_name = entry.file_name(); - let file_name_str = file_name.to_string_lossy(); - - if file_name_str.contains(log_key) && file_name_str.contains(".log") { - log_file_found = true; - if current_log_file_path() - .as_ref() - .is_some_and(|path| path.starts_with(log_path)) - { - let content = fs::read_to_string(entry.path()).expect("Failed to read log"); - assert!( - content.contains("debug message"), - "Log should contain debug" - ); - assert!(content.contains("info message"), "Log should contain info"); - assert!( - content.contains("warning message"), - "Log should contain warning" - ); - assert!( - content.contains("error message"), - "Log should contain error" - ); - } - } + let active_log_path = current_log_file_path().expect("active log file path should exist"); + if let Some(ref previous_path) = previous_path { + assert_eq!( + active_log_path, *previous_path, + "init_log should preserve the first active log file path within a process" + ); + } else { + wait_for_log_file(&active_log_path); + } + if previous_path.is_none() && active_log_path.starts_with(log_path) { + let file_name = active_log_path + .file_name() + .expect("active log file name") + .to_string_lossy(); + assert!( + file_name.contains(instance_key), + "active log file should include instance key when this test owns initialization" + ); + assert_logged_text( + &active_log_path, + &["debug message", "info message", "warning message", "error message"], + ); } - - assert!(log_file_found, "Log file should be created"); } // 移除“不指定日志路径”的测试:生产入口强制要求提供 log_path。 #[test] + #[serial_test::serial(log_init)] fn test_init_log_invalid_path() { // 测试无效路径的处理 let invalid_path = PathBuf::from("/proc/invalid_path_that_cannot_be_created/logs"); @@ -266,11 +290,13 @@ mod tests { // 移除 init_log_test 相关测试:测试不再使用测试专用 logger。 #[test] + #[serial_test::serial(log_init)] fn test_log_file_rotation() { // 测试日志文件按天滚动的功能 let temp_dir = TempDir::new().expect("Failed to create temp directory"); let log_path = temp_dir.path(); let instance_key = "rotation_test"; + let previous_path = current_log_file_path(); // 初始化日志 init_log(log_path, instance_key); @@ -284,20 +310,32 @@ mod tests { // 等待日志写入 std::thread::sleep(std::time::Duration::from_millis(100)); - // 验证文件存在 - let files: Vec<_> = fs::read_dir(log_path) - .expect("Failed to read log directory") - .filter_map(|e| e.ok()) - .map(|e| e.file_name().to_string_lossy().to_string()) - .collect(); - - assert!( - files.iter().any(|f| f.contains("fluxon-kv-rotation_test")), - "Log files should be created with correct instance key" - ); + let active_log_path = current_log_file_path().expect("active log file path should exist"); + if let Some(ref previous_path) = previous_path { + assert_eq!( + active_log_path, *previous_path, + "init_log should preserve the first active log file path within a process" + ); + } else { + wait_for_log_file(&active_log_path); + assert!( + active_log_path.starts_with(log_path), + "first init_log call should bind to the requested directory" + ); + let file_name = active_log_path + .file_name() + .expect("active log file name") + .to_string_lossy(); + assert!( + file_name.contains("fluxon-kv-rotation_test"), + "first init_log call should use the requested instance key" + ); + assert_logged_text(&active_log_path, &["Log message 0", "Warning message 0"]); + } } #[test] + #[serial_test::serial(log_init)] fn test_multiple_init_log_calls() { // 测试多次调用 init_log 的行为 let temp_dir = TempDir::new().expect("Failed to create temp directory"); @@ -306,6 +344,7 @@ mod tests { // 第一次初始化 init_log(log_path, "instance1"); info!("First init message"); + let first_path = current_log_file_path().expect("first active log file path"); // 第二次初始化(应该被忽略,因为 try_init 会失败) init_log(log_path, "instance2"); @@ -313,5 +352,10 @@ mod tests { // 验证不会崩溃 std::thread::sleep(std::time::Duration::from_millis(100)); + assert_eq!( + current_log_file_path().as_ref(), + Some(&first_path), + "multiple init_log calls should preserve the first active log file path" + ); } } diff --git a/fluxon_rs/fluxon_util/src/log.rs b/fluxon_rs/fluxon_util/src/log.rs index fc6066f..2f8de8b 100644 --- a/fluxon_rs/fluxon_util/src/log.rs +++ b/fluxon_rs/fluxon_util/src/log.rs @@ -611,6 +611,11 @@ pub fn init_log_test(test_case_name: &str) { .join("tests") .join(&case); fs::create_dir_all(&dir).expect("create test log dir"); + if std::env::var_os("FLUXON_LOG").is_none() { + unsafe { + std::env::set_var("FLUXON_LOG", "debug"); + } + } // Use test_case_name as instance key so file names are recognizable. init_log(&dir, &case); } diff --git a/fluxon_rs/fluxon_util/src/merge_recent_async_notifies/tests.rs b/fluxon_rs/fluxon_util/src/merge_recent_async_notifies/tests.rs index 2e505f7..b9b9bde 100755 --- a/fluxon_rs/fluxon_util/src/merge_recent_async_notifies/tests.rs +++ b/fluxon_rs/fluxon_util/src/merge_recent_async_notifies/tests.rs @@ -5,11 +5,9 @@ use tokio::sync::mpsc; use tracing::info; #[tokio::test(flavor = "multi_thread", worker_threads = 8)] +#[serial_test::serial(log_init)] async fn test_async_notification_merger_poll() { - // 初始化测试日志(落盘到统一测试目录);级别可通过环境变量控制 - unsafe { - std::env::set_var("FLUXON_LOG", "debug"); - } + // Initialize test logs under the shared test workdir. init_log_test("merge_recent_async_notifies_poll"); let (tx, rx) = mpsc::unbounded_channel::(); @@ -98,11 +96,9 @@ async fn test_async_notification_merger_poll() { } #[tokio::test(flavor = "multi_thread", worker_threads = 8)] +#[serial_test::serial(log_init)] async fn test_user_controlled_loop() { - // 初始化测试日志(第二个用例单独目录) - unsafe { - std::env::set_var("FLUXON_LOG", "debug"); - } + // Initialize test logs under the shared test workdir. init_log_test("merge_recent_async_notifies_user_loop"); let (tx, rx) = mpsc::unbounded_channel::(); let stream = tokio_stream::wrappers::UnboundedReceiverStream::new(rx); diff --git a/fluxon_rs/fluxon_util/src/test_util.rs b/fluxon_rs/fluxon_util/src/test_util.rs index 93ea065..8a65eda 100755 --- a/fluxon_rs/fluxon_util/src/test_util.rs +++ b/fluxon_rs/fluxon_util/src/test_util.rs @@ -170,6 +170,9 @@ pub fn start_test_etcd() -> Result<(), Box> { let mut guard = etcd_process() .lock() .map_err(|_| boxed_error("test etcd process mutex poisoned"))?; + if endpoint_health(&endpoint, Duration::from_secs(2)) { + return Ok(()); + } if let Some(child) = guard.as_mut() { if child.try_wait()?.is_none() { wait_for_etcd_ready(child, &endpoint)?; @@ -239,13 +242,17 @@ pub fn start_test_etcd() -> Result<(), Box> { )) })?; - wait_for_etcd_ready(&mut child, &endpoint).map_err(|e| { + if let Err(e) = wait_for_etcd_ready(&mut child, &endpoint) { + if endpoint_health(&endpoint, Duration::from_secs(2)) { + let _ = child.wait(); + return Ok(()); + } let stdout_hint = read_log_tail(&stdout_path); let stderr_hint = read_log_tail(&stderr_path); - boxed_error(format!( + return Err(boxed_error(format!( "{e}\netcd stdout tail:\n{stdout_hint}\netcd stderr tail:\n{stderr_hint}" - )) - })?; + ))); + } *guard = Some(child); Ok(()) } diff --git a/fluxon_rs/fluxon_util/src/test_util_test.rs b/fluxon_rs/fluxon_util/src/test_util_test.rs index adaa516..3dc77dd 100755 --- a/fluxon_rs/fluxon_util/src/test_util_test.rs +++ b/fluxon_rs/fluxon_util/src/test_util_test.rs @@ -1,8 +1,77 @@ +// This file contains tests for the test utility helpers. + use crate::test_util::{is_etcd_running, start_test_etcd}; +use std::fs; +use std::net::TcpListener; +use std::path::PathBuf; use std::process::Command; +use std::sync::{Mutex, OnceLock}; +struct BuildConfigExtGuard { + path: PathBuf, + previous: Option>, +} + +impl BuildConfigExtGuard { + fn install(contents: String) -> Self { + let path = crate::dev_config::repo_root() + .expect("repo root") + .join("build_config_ext.yml"); + let previous = fs::read(&path).ok(); + fs::write(&path, contents).expect("write test build_config_ext"); + Self { path, previous } + } +} + +impl Drop for BuildConfigExtGuard { + fn drop(&mut self) { + match self.previous.as_deref() { + Some(previous) => { + fs::write(&self.path, previous).expect("restore previous build_config_ext"); + } + None => { + if self.path.exists() { + fs::remove_file(&self.path).expect("remove test build_config_ext"); + } + } + } + } +} + +fn build_config_ext_lock() -> &'static Mutex<()> { + static BUILD_CONFIG_MUTEX: OnceLock> = OnceLock::new(); + BUILD_CONFIG_MUTEX.get_or_init(|| Mutex::new(())) +} + +fn pick_free_etcd_port_pair() -> (u16, u16) { + for _ in 0..32 { + let client_socket = TcpListener::bind(("127.0.0.1", 0)).expect("bind etcd client port"); + let client_port = client_socket + .local_addr() + .expect("read etcd client port") + .port(); + let peer_port = if client_port == u16::MAX { + client_port - 1 + } else { + client_port + 1 + }; + if TcpListener::bind(("127.0.0.1", peer_port)).is_ok() { + drop(client_socket); + return (client_port, peer_port); + } + } + panic!("failed to reserve a free etcd port pair"); +} + +fn install_test_build_config_ext() -> BuildConfigExtGuard { + let (client_port, _peer_port) = pick_free_etcd_port_pair(); + BuildConfigExtGuard::install(format!("etcd: 127.0.0.1:{client_port}\n")) +} #[test] +#[serial_test::serial(build_config_ext)] fn test_etcd_only_starts_once() { + let _build_config_lock = build_config_ext_lock().lock().expect("lock build config"); + let _temp_build_config = install_test_build_config_ext(); start_test_etcd().expect("start local test etcd"); assert!(is_etcd_running(), "etcd should be reachable after startup"); diff --git a/fluxon_test_stack/ci_2_virt_node.py b/fluxon_test_stack/ci_2_virt_node.py index 405c9a2..f055426 100644 --- a/fluxon_test_stack/ci_2_virt_node.py +++ b/fluxon_test_stack/ci_2_virt_node.py @@ -415,11 +415,13 @@ def _rewrite_suite_for_local_dual_nodes( if scene_configs is not None: if not isinstance(scene_configs, dict): raise ValueError("generated public profile runtime.ci.scene_configs must be a mapping") - kv_scene_config = scene_configs.get("ci_top_attention_bin_kvtest") - if kv_scene_config is not None: + for scene_id in ("ci_top_attention_bin_kvtest", "ci_top_attention_cargo_kv_unit"): + kv_scene_config = scene_configs.get(scene_id) + if kv_scene_config is None: + continue if not isinstance(kv_scene_config, dict): raise ValueError( - "generated public profile runtime.ci.scene_configs['ci_top_attention_bin_kvtest'] must be a mapping" + f"generated public profile runtime.ci.scene_configs[{scene_id!r}] must be a mapping" ) # The generated public profile is fixed to the tcp-thread transport branch. kv_scene_config["kv_transport_feature"] = PUBLIC_TRANSPORT_FEATURE diff --git a/fluxon_test_stack/ci_test_list.yaml b/fluxon_test_stack/ci_test_list.yaml index 4230559..ebafe4a 100644 --- a/fluxon_test_stack/ci_test_list.yaml +++ b/fluxon_test_stack/ci_test_list.yaml @@ -29,6 +29,118 @@ scenes: scales: [n1_kvowner_dram_20gib] profiles: [fluxon_tcp] + ci_top_attention_cargo_fs_core: + ci: + subject: rust + runtime_contract: rust_self_managed + select: + scales: [n1_kvowner_dram_3gib] + profiles: [fluxon_tcp] + + ci_top_attention_cargo_util: + ci: + subject: rust + runtime_contract: rust_self_managed + select: + scales: [n1_kvowner_dram_20gib] + profiles: [fluxon_tcp] + + ci_top_attention_cargo_kv_unit: + ci: + subject: rust + runtime_contract: rust_self_managed + select: + scales: [n1_kvowner_dram_20gib] + profiles: [fluxon_tcp] + + ci_top_attention_cargo_cli: + ci: + subject: rust + runtime_contract: rust_self_managed + select: + scales: [n1_kvowner_dram_3gib] + profiles: [fluxon_tcp] + + ci_top_attention_cargo_commu: + ci: + subject: rust + runtime_contract: rust_self_managed + select: + scales: [n1_kvowner_dram_3gib] + profiles: [fluxon_tcp] + + ci_top_attention_cargo_commu_contract: + ci: + subject: rust + runtime_contract: rust_self_managed + select: + scales: [n1_kvowner_dram_3gib] + profiles: [fluxon_tcp] + + ci_top_attention_cargo_framework: + ci: + subject: rust + runtime_contract: rust_self_managed + select: + scales: [n1_kvowner_dram_3gib] + profiles: [fluxon_tcp] + + ci_top_attention_cargo_fs: + ci: + subject: rust + runtime_contract: rust_self_managed + select: + scales: [n1_kvowner_dram_20gib] + profiles: [fluxon_tcp] + + ci_top_attention_cargo_fs_s3_gateway: + ci: + subject: rust + runtime_contract: rust_self_managed + select: + scales: [n1_kvowner_dram_20gib] + profiles: [fluxon_tcp] + + ci_top_attention_cargo_limit_thirdparty: + ci: + subject: rust + runtime_contract: rust_self_managed + select: + scales: [n1_kvowner_dram_3gib] + profiles: [fluxon_tcp] + + ci_top_attention_cargo_mq: + ci: + subject: rust + runtime_contract: rust_self_managed + select: + scales: [n1_kvowner_dram_3gib] + profiles: [fluxon_tcp] + + ci_top_attention_cargo_observability: + ci: + subject: rust + runtime_contract: rust_self_managed + select: + scales: [n1_kvowner_dram_3gib] + profiles: [fluxon_tcp] + + ci_top_attention_cargo_ops: + ci: + subject: rust + runtime_contract: rust_self_managed + select: + scales: [n1_kvowner_dram_3gib] + profiles: [fluxon_tcp] + + ci_top_attention_cargo_pyo3: + ci: + subject: rust + runtime_contract: rust_self_managed + select: + scales: [n1_kvowner_dram_3gib] + profiles: [fluxon_tcp] + ci_top_attention_log_mgmt: ci: subject: rust @@ -326,9 +438,12 @@ profiles: ci_top_attention_doc_page_build: doc_site_base_url: example.com ci_top_attention_bin_kvtest: + kv_transport_feature: tcp_thread_transport kv_test_rounds: all - ci_top_attention_log_mgmt: - enabled: true + ci_top_attention_cargo_fs_core: {} + ci_top_attention_cargo_util: {} + ci_top_attention_cargo_kv_unit: + kv_transport_feature: tcp_thread_transport ci_top_attention_log_mgmt: enabled: true ci_top_attention_mq_core: {} @@ -476,9 +591,12 @@ profiles: ci_top_attention_doc_page_build: doc_site_base_url: example.com ci_top_attention_bin_kvtest: + kv_transport_feature: tcp_thread_transport kv_test_rounds: all - ci_top_attention_log_mgmt: - enabled: true + ci_top_attention_cargo_fs_core: {} + ci_top_attention_cargo_util: {} + ci_top_attention_cargo_kv_unit: + kv_transport_feature: tcp_thread_transport ci_top_attention_log_mgmt: enabled: true ci_top_attention_mq_core: {} @@ -493,9 +611,12 @@ profiles: ci_top_attention_doc_page_build: doc_site_base_url: example.com ci_top_attention_bin_kvtest: + kv_transport_feature: tcp_thread_transport kv_test_rounds: all - ci_top_attention_log_mgmt: - enabled: true + ci_top_attention_cargo_fs_core: {} + ci_top_attention_cargo_util: {} + ci_top_attention_cargo_kv_unit: + kv_transport_feature: tcp_thread_transport ci_top_attention_log_mgmt: enabled: true ci_top_attention_mq_core: {} @@ -510,9 +631,12 @@ profiles: ci_top_attention_doc_page_build: doc_site_base_url: example.com ci_top_attention_bin_kvtest: + kv_transport_feature: tcp_thread_transport kv_test_rounds: all - ci_top_attention_log_mgmt: - enabled: true + ci_top_attention_cargo_fs_core: {} + ci_top_attention_cargo_util: {} + ci_top_attention_cargo_kv_unit: + kv_transport_feature: tcp_thread_transport ci_top_attention_log_mgmt: enabled: true ci_top_attention_mq_core: {} diff --git a/fluxon_test_stack/pack_test_stack_rsc.py b/fluxon_test_stack/pack_test_stack_rsc.py index 5d92271..a22df58 100644 --- a/fluxon_test_stack/pack_test_stack_rsc.py +++ b/fluxon_test_stack/pack_test_stack_rsc.py @@ -1203,7 +1203,13 @@ def _prepare_python_runtime_wheelhouse_into_root( dependency_sets=dependency_sets, ) existing_names = sorted(path.name for path in wheelhouse_root.glob("*.whl")) - if _wheelhouse_satisfies_specs(existing_names=existing_names, expected_specs=expected_specs): + if _wheelhouse_satisfies_specs(existing_names=existing_names, expected_specs=expected_specs) and ( + _wheelhouse_resolves_offline( + wheelhouse_root=wheelhouse_root, + python_abi=python_abi, + expected_specs=expected_specs, + ) + ): print(f"Using existing prepared TEST_STACK runtime wheelhouse: {wheelhouse_root}") return @@ -1329,6 +1335,34 @@ def _wheelhouse_satisfies_specs( return True +def _wheelhouse_resolves_offline( + *, + wheelhouse_root: Path, + python_abi: str, + expected_specs: tuple[dict[str, str], ...], +) -> bool: + python_bin = _python_executable_for_python_abi(python_abi=python_abi) + pinned_specs = [f"{spec['name']}=={spec['version']}" for spec in expected_specs] + with tempfile.TemporaryDirectory(prefix="fluxon_test_stack_wheelhouse_validate_") as td: + argv = [ + python_bin, + "-m", + "pip", + "download", + "--no-index", + "--dest", + td, + "--find-links", + str(wheelhouse_root), + ] + argv.extend(pinned_specs) + try: + subprocess.check_call(argv, cwd=str(REPO_ROOT)) + except subprocess.CalledProcessError: + return False + return True + + def _download_python_runtime_wheels( *, out_dir: Path, @@ -1338,6 +1372,9 @@ def _download_python_runtime_wheels( ) -> None: abi_suffix = python_abi.removeprefix("cpython") cp_tag = "cp" + abi_suffix.replace(".", "") + # Pip evaluates dependency markers against the interpreter running pip, so + # preparing a cpython3.10 wheelhouse must run under a Python 3.10 binary. + python_bin = _python_executable_for_python_abi(python_abi=python_abi) wheel_specs: list[str] = [] sdist_specs: list[str] = [] for spec in expected_specs: @@ -1349,7 +1386,7 @@ def _download_python_runtime_wheels( if wheel_specs: argv = [ - sys.executable, + python_bin, "-m", "pip", "download", @@ -1370,7 +1407,7 @@ def _download_python_runtime_wheels( for pinned in sdist_specs: argv = [ - sys.executable, + python_bin, "-m", "pip", "wheel", @@ -1386,6 +1423,55 @@ def _download_python_runtime_wheels( "downloaded TEST_STACK runtime wheelhouse is incomplete: " f"out_dir={out_dir} expected={[spec['name'] + '==' + spec['version'] for spec in expected_specs]}" ) + if not _wheelhouse_resolves_offline( + wheelhouse_root=out_dir, + python_abi=python_abi, + expected_specs=expected_specs, + ): + raise RuntimeError( + "downloaded TEST_STACK runtime wheelhouse cannot satisfy offline dependency resolution: " + f"out_dir={out_dir} expected={[spec['name'] + '==' + spec['version'] for spec in expected_specs]}" + ) + + +def _python_executable_abi(python_bin: str) -> str: + try: + return subprocess.check_output( + [ + python_bin, + "-c", + ( + "import sys; " + "print(f'{sys.implementation.name}{sys.version_info[0]}.{sys.version_info[1]}')" + ), + ], + text=True, + ).strip() + except (OSError, subprocess.CalledProcessError) as exc: + raise RuntimeError(f"failed to probe python ABI for executable: {python_bin}") from exc + + +def _python_executable_for_python_abi(*, python_abi: str) -> str: + version = python_abi.removeprefix("cpython") + candidate_names: list[str] = [f"python{version}", "python3", "python"] + candidates: list[str] = [] + seen: set[str] = set() + for raw_candidate in candidate_names: + resolved = shutil.which(raw_candidate) + if resolved is None or resolved in seen: + continue + seen.add(resolved) + candidates.append(resolved) + if not candidates: + raise RuntimeError( + f"preparing TEST_STACK runtime wheelhouse for {python_abi} requires a matching Python interpreter on PATH" + ) + for python_bin in candidates: + if _python_executable_abi(python_bin) == python_abi: + return python_bin + raise RuntimeError( + f"preparing TEST_STACK runtime wheelhouse for {python_abi} requires a matching Python interpreter on PATH" + ) def _normalize_python_distribution_name(name: str) -> str: diff --git a/fluxon_test_stack/test_profile_adapter.py b/fluxon_test_stack/test_profile_adapter.py index 57afbdc..8935198 100644 --- a/fluxon_test_stack/test_profile_adapter.py +++ b/fluxon_test_stack/test_profile_adapter.py @@ -795,7 +795,6 @@ def _action_deploy( _write_yaml_file(logs_dir / "deploy_response.yaml", deploy_resp) - def _action_collect(run_dir: Path, controller_url: str, instances: List[_InstanceReq]) -> None: logs_dir = run_dir / "logs" logs_dir.mkdir(parents=True, exist_ok=True) @@ -806,8 +805,8 @@ def _action_collect(run_dir: Path, controller_url: str, instances: List[_Instanc # English note: # - /api/status is an observability endpoint. During transient runtime failures (e.g. P2P timeouts) - # the controller may return a non-2xx HTTP status. Treat that as a captured status, not as a - # hard failure of the "collect" phase, so the runner can still finalize deterministically using + # the controller may return a non-2xx HTTP status. Treat that as captured status, not as a + # hard failure of the collect phase, so the runner can still finalize deterministically using # terminal artifacts (summary.yaml / benchmark_result.json). status_code, status = _http_status_allow_error( controller_url, @@ -819,6 +818,7 @@ def _action_collect(run_dir: Path, controller_url: str, instances: List[_Instanc _write_yaml_file(inst_dir / "status.yaml", {"status_code": int(status_code), "status": status}) + def _action_teardown(controller_url: str, instances: List[_InstanceReq]) -> None: for inst in instances: resp = _http_delete_generation( @@ -1029,6 +1029,21 @@ def _wait_running( time.sleep(1.0) +def _http_status_allow_error( + controller_url: str, + target: str, + kind: str, + name: str, + authority: str, +) -> tuple[int, Dict[str, Any]]: + qs = urllib.parse.urlencode( + {"target": target, "kind": kind, "name": name, "authority": authority} + ) + url = controller_url + "/api/status?" + qs + req = _new_controller_request(url, method="GET") + return _http_json_allow_error_status(req) + + def _http_deploy(controller_url: str, yaml_text: str) -> Dict[str, Any]: url = controller_url + "/api/deploy" data = yaml_text.encode("utf-8") @@ -1174,21 +1189,6 @@ def _http_status(controller_url: str, target: str, kind: str, name: str) -> Dict return _http_json(req) -def _http_status_allow_error( - controller_url: str, - target: str, - kind: str, - name: str, - authority: str, -) -> tuple[int, Dict[str, Any]]: - qs = urllib.parse.urlencode( - {"target": target, "kind": kind, "name": name, "authority": authority} - ) - url = controller_url + "/api/status?" + qs - req = _new_controller_request(url, method="GET") - return _http_json_allow_error_status(req) - - def _http_delete_generation( controller_url: str, target: str, diff --git a/fluxon_test_stack/test_runner.py b/fluxon_test_stack/test_runner.py index 2236be5..07d8485 100644 --- a/fluxon_test_stack/test_runner.py +++ b/fluxon_test_stack/test_runner.py @@ -120,14 +120,12 @@ CASE_FAMILY_INFER = "infer" CASE_FAMILY_CI = "ci" CASE_FAMILY_BENCH = "bench" -INFER_PATTERN_REPEAT = "REPEAT_PROMPTS" -INFER_PATTERN_UNIQUE = "UNIQUE_PROMPTS" -INFER_STACK_VLLM_LMCACHE = "VLLM_LMCACHE" -INFER_STACK_SGLANG_HICACHE = "SGLANG_HICACHE" RUN_OUTCOME_SUCCESS = "SUCCESS" RUN_OUTCOME_FAILED = "FAILED" _RUN_SUMMARY_INCOMPLETE_ERROR = "INCOMPLETE: run started but did not reach finalize; runner likely exited abruptly." _RUN_EXCEPTION_FILENAME = "exception.txt" +_DEBUG_TAIL_MAX_BYTES = 8192 +_TEST_RUNNER_DIAGNOSTIC_VERSION = 2 CI_PRESERVED_APPLY_IDS_SCHEMA_VERSION = 1 CI_PRESERVED_APPLY_IDS_FILENAME = "ci_preserved_apply_ids.yaml" CI_RUNTIME_CONTRACT_CLUSTER_KV_OWNER = "cluster_kv_owner" @@ -177,6 +175,11 @@ CI_RUNNER_SHARED_BUNDLE_TIMEOUT_S = 600 CI_RUNNER_READINESS_PROBE_DEADLINE_S = 120 CI_RUNNER_EXIT_CODE_GRACE_TIMEOUT_S = 300 +CI_RUNNER_TERMINAL_EXIT_CODE_FILE_GRACE_S = 15.0 +CI_RUNNER_STDOUT_TERMINAL_EXIT_CODE_RE = re.compile( + r"^\[ci_runner\] (?:wrote|found existing) exit_code=(-?[0-9]+); holding until controller stop$", + re.MULTILINE, +) TEST_STACK_REMOTE_STAGE_SHARED_INCLUDE_RELPATHS = ( "benchmark_config.py", "deployer_deploy.yaml", @@ -407,9 +410,22 @@ def _runner_native_ci_scene_ids() -> Tuple[str, ...]: return ( "ci_top_attention_doc_page_build", "ci_top_attention_bin_kvtest", + "ci_top_attention_cargo_fs_core", + "ci_top_attention_cargo_util", + "ci_top_attention_cargo_kv_unit", + "ci_top_attention_cargo_cli", + "ci_top_attention_cargo_commu", + "ci_top_attention_cargo_commu_contract", + "ci_top_attention_cargo_framework", + "ci_top_attention_cargo_fs", + "ci_top_attention_cargo_fs_s3_gateway", + "ci_top_attention_cargo_limit_thirdparty", + "ci_top_attention_cargo_mq", + "ci_top_attention_cargo_observability", + "ci_top_attention_cargo_ops", + "ci_top_attention_cargo_pyo3", "ci_top_attention_log_mgmt", "ci_top_attention_mq_core", - "ci_top_attention_log_mgmt", ) @@ -550,7 +566,7 @@ def _redirect_process_stdio_to_log( - test_runner can run for hours under terminal/session wrappers that may disappear while the suite is still executing. - A deleted PTY turns ordinary `print(..., flush=True)` into `OSError(EIO)`, which aborts the - runner in collect/finalize paths and leaves case_runs.yaml stuck at a reserved run. + runner in shutdown/finalize paths and leaves case_runs.yaml stuck at a reserved run. - Use a deterministic per-workdir log sink for the whole process, including child subprocesses. """ global _RUNNER_STDIO_LOG_FP @@ -753,6 +769,12 @@ def main() -> None: _ui_history_register_workdir(workdir_root) _redirect_process_stdio_to_log(workdir_root) + print( + "[TEST_RUNNER diag] " + f"version={_TEST_RUNNER_DIAGNOSTIC_VERSION} action={action} " + f"script={Path(__file__).resolve()} workdir={workdir_root.resolve()}", + flush=True, + ) if action == "clean": _clean_workdir(workdir_root) @@ -865,6 +887,7 @@ def main() -> None: ) suite_failed = False + failed_case_summaries: List[str] = [] for planned_case in scheduled: case = planned_case.case if suite.run_mode == RUN_MODE_FULL_ONCE and planned_case.counted: @@ -936,6 +959,7 @@ def main() -> None: case_plan: Optional[_CasePlan] = None case_error: Optional[str] = None finalize_error: Optional[str] = None + case_debug_emitted = False try: resolved_case = _build_resolved_case_yaml( @@ -959,7 +983,7 @@ def main() -> None: ) if _case_family_uses_case_plan(case_family): case_plan = _compile_case_plan(resolved_case) - if case_family in (CASE_FAMILY_INFER, CASE_FAMILY_CI, CASE_FAMILY_BENCH): + if case_family in (CASE_FAMILY_CI, CASE_FAMILY_BENCH): _apply_stable_deploy_names(resolved_case) _sync_case_runtime_model_from_deploy(resolved_case) @@ -975,40 +999,7 @@ def main() -> None: case_plan=case_plan, runtime_tracking=runtime_tracking, ) - if case_family == CASE_FAMILY_INFER: - _ensure_deployer_online(resolved_case) - _write_deployer_manifests(resolved_case, run_dir, allow_overwrite=False) - - infer_deploy_attempted = True - deploy_result = _run_adapter_action( - resolved_case, run_dir=run_dir, action="deploy" - ) - _validate_deploy_result(resolved_case, deploy_result) - - endpoint_url = _resolved_endpoint_url(resolved_case, deploy_result) - _tcp_check_endpoint(endpoint_url) - - infer_out = _run_infer_ai_perf(resolved_case, deploy_result, run_dir) - summary = _build_infer_summary_yaml( - resolved_case, - deploy_result, - run_index=run_slot.run_index, - started_at_unix_s=started_at, - finished_at_unix_s=int(time.time()), - outcome=RUN_OUTCOME_SUCCESS, - counted=False, - ai_perf_out=infer_out, - ) - _write_yaml_file(run_dir / "summary.yaml", summary) - - _run_adapter_action( - resolved_case, run_dir=run_dir, action="collect" - ) - - outcome = RUN_OUTCOME_SUCCESS - - - elif _case_family_uses_case_plan(case_family): + if _case_family_uses_case_plan(case_family): if case_plan is None: raise ValueError(f"internal error: case_plan is missing for case_family={case_family}") prepared_case = _prepare_case( @@ -1044,23 +1035,17 @@ def main() -> None: except Exception as write_exc: # noqa: BLE001 case_error = f"{type(exc).__name__}: {exc} (failed to write {_RUN_EXCEPTION_FILENAME}: {type(write_exc).__name__}: {write_exc})" print(f"ERROR: case failed: case_id={case.case_id} err={case_error}") + _emit_case_debug_footer( + case_id=case.case_id, + run_dir=run_dir, + summary_path=summary_path, + case_runs_path=case_runs_path, + reason="case_exception", + ) + case_debug_emitted = True outcome = RUN_OUTCOME_FAILED finally: - if case_family == CASE_FAMILY_INFER and resolved_case is not None: - try: - if infer_deploy_attempted: - _run_adapter_action( - resolved_case, - run_dir=run_dir, - action="teardown", - ) - except Exception as exc: - print( - "ERROR: teardown failed; stopping (no fallback). " - f"case_id={case.case_id} err={type(exc).__name__}: {exc}" - ) - raise SystemExit(1) if case_plan is not None and resolved_case is not None: try: _finalize_case_runtime( @@ -1076,11 +1061,17 @@ def main() -> None: "ERROR: teardown failed; stopping after finalize (no fallback). " f"case_id={case.case_id} err={finalize_error}" ) - if case_family == CASE_FAMILY_BENCH and outcome == RUN_OUTCOME_SUCCESS: - print( - "WARN: TEST_STACK finalize failed after terminal benchmark success; " - f"preserving SUCCESS outcome for case_id={case.case_id} finalize_err={finalize_error}" - ) + if _preserve_success_after_finalize_error(case_family=case_family, outcome=outcome): + if case_family == CASE_FAMILY_BENCH: + print( + "WARN: TEST_STACK finalize failed after terminal benchmark success; " + f"preserving SUCCESS outcome for case_id={case.case_id} finalize_err={finalize_error}" + ) + else: + print( + "WARN: CI finalize failed after terminal ci_runner success; " + f"preserving SUCCESS outcome for case_id={case.case_id} finalize_err={finalize_error}" + ) else: outcome = RUN_OUTCOME_FAILED if suite.run_mode == RUN_MODE_DEBUG_ONE_BY_ONE and outcome != RUN_OUTCOME_SUCCESS: @@ -1149,13 +1140,54 @@ def main() -> None: ) except Exception as exc: print(f"ERROR: failed to write/update summary.yaml: {exc}") + if not case_debug_emitted: + _emit_case_debug_footer( + case_id=case.case_id, + run_dir=run_dir, + summary_path=summary_path, + case_runs_path=case_runs_path, + reason="summary_update_error", + ) + case_debug_emitted = True raise SystemExit(1) if fatal_stop_after_finalize: + if not case_debug_emitted: + _emit_case_debug_footer( + case_id=case.case_id, + run_dir=run_dir, + summary_path=summary_path, + case_runs_path=case_runs_path, + reason="finalize_fatal_stop", + ) + case_debug_emitted = True raise SystemExit(1) + case_result_parts = [ + f"case_id={case.case_id}", + f"run_index={run_slot.run_index}", + f"outcome={outcome}", + f"counted={counted}", + f"summary={summary_path}", + ] + if case_error is not None: + case_result_parts.append(f"case_error={case_error}") + if finalize_error is not None: + case_result_parts.append(f"finalize_error={finalize_error}") + print("[CASE result] " + " ".join(case_result_parts), flush=True) + if outcome != RUN_OUTCOME_SUCCESS: + if not case_debug_emitted: + _emit_case_debug_footer( + case_id=case.case_id, + run_dir=run_dir, + summary_path=summary_path, + case_runs_path=case_runs_path, + reason="case_failed", + ) + case_debug_emitted = True suite_failed = True + failed_case_summaries.append(" ".join(case_result_parts)) # RUN_MODE_DEBUG_ONE_BY_ONE is intended for local iteration: stop at first failure. # RUN_MODE_FULL_ONCE should run the whole matrix so we can see every failing case # in one case_runs.yaml, then exit non-zero at the end. @@ -1163,8 +1195,113 @@ def main() -> None: raise SystemExit(1) if suite_failed: + print("[SUITE result] FAILED", flush=True) + for summary in failed_case_summaries: + print("[SUITE failed_case] " + summary, flush=True) + print(f"[SUITE artifacts] case_runs={case_runs_path}", flush=True) + _emit_suite_debug_footer( + reason="suite_failed", + case_runs=case_runs, + case_runs_path=case_runs_path, + scheduled=scheduled, + ) raise SystemExit(1) + print(f"[SUITE result] SUCCESS case_runs={case_runs_path}", flush=True) + _emit_suite_debug_footer( + reason="suite_success", + case_runs=case_runs, + case_runs_path=case_runs_path, + scheduled=scheduled, + ) + + +def _read_text_tail_for_debug(path: Path, *, max_bytes: int = _DEBUG_TAIL_MAX_BYTES) -> Optional[str]: + try: + if not path.exists(): + return None + with path.open("rb") as f: + f.seek(0, os.SEEK_END) + size = f.tell() + f.seek(max(0, size - int(max_bytes)), os.SEEK_SET) + data = f.read() + return data.decode("utf-8", errors="replace") + except Exception as exc: # noqa: BLE001 + return f"" + + +def _emit_debug_file_tail(label: str, path: Path, *, max_bytes: int = _DEBUG_TAIL_MAX_BYTES) -> None: + resolved = path.resolve() + text = _read_text_tail_for_debug(resolved, max_bytes=max_bytes) + print(f"[DEBUG file] label={label} path={resolved} exists={text is not None}", flush=True) + if text is None: + return + print(f"[DEBUG file_tail_begin] label={label} max_bytes={int(max_bytes)}", flush=True) + if text: + sys.stdout.write(text) + if not text.endswith("\n"): + sys.stdout.write("\n") + sys.stdout.flush() + print(f"[DEBUG file_tail_end] label={label}", flush=True) + + +def _emit_case_debug_footer( + *, + case_id: str, + run_dir: Path, + summary_path: Path, + case_runs_path: Path, + reason: str, +) -> None: + try: + print( + "[CASE debug] " + f"reason={reason} case_id={case_id} run_dir={run_dir.resolve()} " + f"summary={summary_path.resolve()} case_runs={case_runs_path.resolve()}", + flush=True, + ) + _emit_debug_file_tail("summary.yaml", summary_path) + _emit_debug_file_tail(_RUN_EXCEPTION_FILENAME, run_dir / _RUN_EXCEPTION_FILENAME) + _emit_debug_file_tail("ci_runner.exit_code", run_dir / "logs" / "ci_runner" / "exit_code.txt") + _emit_debug_file_tail("ci_runner.stdout", run_dir / "logs" / "ci_runner" / "stdout.log") + except Exception as exc: # noqa: BLE001 + print(f"[CASE debug] failed to emit debug footer: {type(exc).__name__}: {exc}", flush=True) + + +def _emit_suite_debug_footer( + *, + reason: str, + case_runs: Dict[str, Any], + case_runs_path: Path, + scheduled: List[_PlannedCase], +) -> None: + try: + print( + "[SUITE debug] " + f"version={_TEST_RUNNER_DIAGNOSTIC_VERSION} reason={reason} " + f"scheduled={len(scheduled)} case_runs={case_runs_path.resolve()}", + flush=True, + ) + run_map = _case_runs_map(case_runs) + for planned_case in scheduled: + case_id = planned_case.case.case_id + rec = run_map.get(case_id) + if rec is None: + print(f"[SUITE debug_case] case_id={case_id} missing_in_case_runs=true", flush=True) + continue + last_run = rec.get("last_run") + last_run_json = json.dumps(last_run, sort_keys=True, separators=(",", ":")) + print( + "[SUITE debug_case] " + f"case_id={case_id} counted={planned_case.counted} " + f"total_runs={rec.get('total_runs')} success_runs={rec.get('success_runs')} " + f"failed_runs={rec.get('failed_runs')} counted_runs={rec.get('counted_runs')} " + f"last_run={last_run_json}", + flush=True, + ) + except Exception as exc: # noqa: BLE001 + print(f"[SUITE debug] failed to emit debug footer: {type(exc).__name__}: {exc}", flush=True) + def _load_yaml_file(path: Path) -> Any: with path.open("r", encoding="utf-8") as f: @@ -2406,7 +2543,7 @@ def _resolved_case_ops_namespace(resolved_case: Dict[str, Any]) -> str: def _apply_stable_deploy_names(resolved_case: Dict[str, Any]) -> None: """Rewrite deploy.instances[].k8s_ref into a stable logical deployment name. - For CI/infer, replacement semantics follow the logical case identity and stay rerun-stable. + For CI cases, replacement semantics follow the logical case identity and stay rerun-stable. For TEST_STACK benchmark workloads, names are additionally scoped by run_dir hash so a stale controller/runtime from an older runner cannot collide with the current run. """ @@ -2433,8 +2570,6 @@ def _apply_stable_deploy_names(resolved_case: Dict[str, Any]) -> None: def _resolved_case_kind(resolved_case: Dict[str, Any]) -> str: scene = _require_dict(resolved_case.get("scene"), "resolved_case.scene") - if scene.get("infer") is not None: - return SCENE_KIND_INFER if scene.get("ci") is not None: return SCENE_KIND_CI if scene.get("test_stack") is not None: @@ -2445,7 +2580,7 @@ def _resolved_case_kind(resolved_case: Dict[str, Any]) -> str: def _resolved_case_family(resolved_case: Dict[str, Any]) -> str: case = _require_dict(resolved_case.get("case"), "resolved_case.case") family = _require_str(case.get("family"), "resolved_case.case.family") - if family not in (CASE_FAMILY_INFER, CASE_FAMILY_CI, CASE_FAMILY_BENCH): + if family not in (CASE_FAMILY_CI, CASE_FAMILY_BENCH): raise ValueError(f"resolved_case.case.family unsupported: {family!r}") return family @@ -2472,8 +2607,6 @@ def _ci_runtime_contract_id(resolved_case: Dict[str, Any]) -> str: def _case_family_id(case_kind: str) -> str: - if case_kind == SCENE_KIND_INFER: - return CASE_FAMILY_INFER if case_kind == SCENE_KIND_CI: return CASE_FAMILY_CI if case_kind == SCENE_KIND_TEST_STACK: @@ -2515,7 +2648,7 @@ def _close_case_runtime_locks(runtime_tracking: _CaseRuntimeTracking) -> None: def _build_runtime_model(case_family: str) -> Dict[str, Any]: if case_family == CASE_FAMILY_CI: case_instance_ids = list(CI_RUNTIME_LAYER_INSTANCE_IDS[RUNTIME_LAYER_CASE]) - elif case_family in (CASE_FAMILY_INFER, CASE_FAMILY_BENCH): + elif case_family == CASE_FAMILY_BENCH: case_instance_ids = [] else: raise ValueError(f"unsupported runtime model case family: {case_family}") @@ -2616,9 +2749,6 @@ def _compile_case_runtime_artifacts( test_stack_meta = _compile_test_stack_case(resolved_case, run_index=run_index) _sync_case_runtime_model_from_deploy(resolved_case) return test_stack_meta - if case_family == CASE_FAMILY_INFER: - _sync_case_runtime_model_from_deploy(resolved_case) - return None raise ValueError(f"unsupported case family for runtime artifact compilation: {case_family}") @@ -3063,16 +3193,6 @@ def _deploy_runtime_phase( return _deploy_runtime_phase_after_stage(resolved_case, run_dir=run_dir, phase=phase) -def _collect_runtime_phase( - resolved_case: Dict[str, Any], - *, - run_dir: Path, - phase: _RuntimePhase, -) -> None: - _write_runtime_phase_inputs(resolved_case, run_dir=run_dir, phase=phase) - _run_adapter_action(resolved_case, run_dir=run_dir, action="collect") - - def _ci_cluster_runtime_stage(resolved_case: Dict[str, Any]) -> _RemoteRunDirStage: verify_relpaths = list(CI_CLUSTER_RUNTIME_REMOTE_STAGE_VERIFY_RELPATHS) if _ci_has_instance(resolved_case, instance_id="owner_0"): @@ -3126,12 +3246,6 @@ def _ci_runtime_phase(resolved_case: Dict[str, Any], phase_id: str) -> _RuntimeP write_ctx="CI", stage_run_dir=_ci_runner_runtime_stage(resolved_case), ), - "collect_all": _RuntimePhase( - phase_id="collect_all", - layer=RUNTIME_LAYER_CASE, - instance_ids=CI_RUNTIME_INSTANCE_IDS, - write_ctx="CI", - ), } try: return phases[phase_id] @@ -3183,24 +3297,6 @@ def _test_stack_runtime_phase( write_ctx="TEST_STACK", stage_run_dir=stage_run_dir, ) - if phase_id == "collect_nodes": - if node_ids is None or not node_ids: - raise ValueError("TEST_STACK collect_nodes phase requires non-empty node_ids") - return _RuntimePhase( - phase_id="collect_nodes", - layer=RUNTIME_LAYER_CASE, - instance_ids=node_ids, - write_ctx="TEST_STACK", - ) - if phase_id == "collect_coordinator": - if node_ids is not None: - raise ValueError("TEST_STACK collect_coordinator phase does not accept node_ids") - return _RuntimePhase( - phase_id="collect_coordinator", - layer=RUNTIME_LAYER_CASE, - instance_ids=("coordinator",), - write_ctx="TEST_STACK", - ) raise ValueError(f"unsupported TEST_STACK runtime phase: {phase_id}") @@ -3228,14 +3324,6 @@ def _compile_case_plan(resolved_case: Dict[str, Any]) -> _CasePlan: execute_phases=( _ci_runtime_phase(resolved_case, "ci_runner"), ), - collect_phases=( - _RuntimePhase( - phase_id="collect_all", - layer=RUNTIME_LAYER_CASE, - instance_ids=case_instance_ids, - write_ctx="CI", - ), - ), ) if case_family == CASE_FAMILY_BENCH: deploy = _require_dict(resolved_case.get("deploy"), "resolved_case.deploy") @@ -3292,15 +3380,6 @@ def _compile_case_plan(resolved_case: Dict[str, Any]) -> _CasePlan: include_stage_run_dir=False, ), ), - collect_phases=( - _test_stack_runtime_phase(phase_id="collect_nodes", node_ids=node_ids_tuple), - _RuntimePhase( - phase_id="collect_coordinator", - layer=RUNTIME_LAYER_CASE, - instance_ids=prepare_ids_tuple, - write_ctx="TEST_STACK", - ), - ), ) raise ValueError(f"unsupported case family for case plan: {case_family}") @@ -3502,12 +3581,14 @@ def _finalize_ci_case_runtime( def _finalize_test_stack_case_runtime( resolved_case: Dict[str, Any], *, + run_dir: Path, runtime_tracking: _CaseRuntimeTracking, outcome: str, ) -> None: _finalize_test_stack_case_runtime_impl( ctx=sys.modules[__name__], resolved_case=resolved_case, + run_dir=run_dir, runtime_tracking=runtime_tracking, outcome=outcome, ) @@ -3652,6 +3733,14 @@ def _resolved_run_dir_path(resolved_case: Dict[str, Any]) -> Path: return Path(_require_str(runtime.get("run_dir"), "runtime.run_dir")).resolve() +def _resolved_case_run_index(resolved_case: Dict[str, Any]) -> int: + run_dir = _resolved_run_dir_path(resolved_case) + run_index = _ui_parse_run_index(run_dir.name) + if run_index is None: + raise ValueError(f"resolved_case.runtime.run_dir must end with run_: {run_dir}") + return int(run_index) + + def _ci_share_mem_path(resolved_case: Dict[str, Any], *, run_dir: Path) -> str: runtime = _require_dict(resolved_case.get("runtime"), "resolved_case.runtime") stack_identity = _require_dict(runtime.get("stack_identity"), "resolved_case.runtime.stack_identity") @@ -3893,10 +3982,62 @@ def _ci_local_runtime_targets(resolved_case: Dict[str, Any]) -> set[str]: return out +def _ci_kv_master_port(resolved_case: Dict[str, Any]) -> Optional[int]: + profile = _require_dict(resolved_case.get("profile"), "resolved_case.profile") + profile_test_stack = profile.get("test_stack") + if profile_test_stack is None: + return None + profile_test_stack = _require_dict(profile_test_stack, "resolved_case.profile.test_stack") + backend_kind = _require_test_stack_backend_kind( + profile_test_stack.get("kind"), + "resolved_case.profile.test_stack.kind", + ) + if backend_kind != TEST_STACK_BACKEND_FLUXON: + return None + port_alloc = _require_dict(profile_test_stack.get("port_alloc"), "profile.test_stack.port_alloc") + kv_master_port_base = _require_int( + port_alloc.get("kv_master_port_base"), + "profile.test_stack.port_alloc.kv_master_port_base", + min_v=1, + ) + kv_master_port_stride = _require_int( + port_alloc.get("kv_master_port_stride"), + "profile.test_stack.port_alloc.kv_master_port_stride", + min_v=1, + ) + run_index = _resolved_case_run_index(resolved_case) + runner_root = _test_stack_runner_root(_resolved_run_dir_path(resolved_case)) + master_port_slot_offset = _test_stack_runner_port_slot( + runner_root=runner_root, + stride=kv_master_port_stride, + ) + kv_master_port = ( + int(kv_master_port_base) + + int(kv_master_port_stride) * int(run_index - 1) + + int(master_port_slot_offset) + ) + if kv_master_port <= 0 or kv_master_port > 65535: + raise ValueError(f"computed kv_master_port out of range: {kv_master_port}") + return int(kv_master_port) + + def _ci_required_ports(resolved_case: Dict[str, Any]) -> List[Tuple[str, int]]: resolved_case = _ci_runtime_cleanup_case(resolved_case, ctx="CI required ports") - _ = _ci_local_runtime_targets(resolved_case) - return [] + local_targets = _ci_local_runtime_targets(resolved_case) + if not local_targets: + return [] + required_ports: List[Tuple[str, int]] = [] + if "master" in set(_ci_case_instance_ids(resolved_case)): + master_instance = _find_deploy_instance(resolved_case, instance_id="master") + master_target = _require_str( + _require_dict(master_instance.get("deployer"), "master.deployer").get("target"), + "master.target", + ) + if master_target in local_targets: + kv_master_port = _ci_kv_master_port(resolved_case) + if kv_master_port is not None: + required_ports.append(("ci master", int(kv_master_port))) + return required_ports def _ci_assert_ports_free(resolved_case: Dict[str, Any]) -> None: @@ -6933,6 +7074,162 @@ def _runner_native_ci_commands_for_case(case: _ResolvedCase, *, ctx: str) -> Lis "timeout_seconds": 21600, } ] + if scene_id == "ci_top_attention_cargo_fs_core": + return [ + { + "id": "top_attention_cargo_fs_core", + "command": ( + "__RUN_DIR__/venv/bin/python3 -u " + "__RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_cargo_fs_core.py" + ), + "timeout_seconds": 21600, + } + ] + if scene_id == "ci_top_attention_cargo_util": + return [ + { + "id": "top_attention_cargo_util", + "command": ( + "__RUN_DIR__/venv/bin/python3 -u " + "__RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_cargo_util.py " + "--case-config __RUN_DIR__/configs/ci_scene_config.yaml" + ), + "timeout_seconds": 21600, + } + ] + if scene_id == "ci_top_attention_cargo_kv_unit": + return [ + { + "id": "top_attention_cargo_kv_unit", + "command": ( + "__RUN_DIR__/venv/bin/python3 -u " + "__RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_cargo_kv_unit.py " + "--case-config __RUN_DIR__/configs/ci_scene_config.yaml" + ), + "timeout_seconds": 21600, + } + ] + if scene_id == "ci_top_attention_cargo_cli": + return [ + { + "id": "top_attention_cargo_cli", + "command": ( + "__RUN_DIR__/venv/bin/python3 -u " + "__RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_cargo_cli.py" + ), + "timeout_seconds": 21600, + } + ] + if scene_id == "ci_top_attention_cargo_commu": + return [ + { + "id": "top_attention_cargo_commu", + "command": ( + "__RUN_DIR__/venv/bin/python3 -u " + "__RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_cargo_commu.py" + ), + "timeout_seconds": 21600, + } + ] + if scene_id == "ci_top_attention_cargo_commu_contract": + return [ + { + "id": "top_attention_cargo_commu_contract", + "command": ( + "__RUN_DIR__/venv/bin/python3 -u " + "__RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_cargo_commu_contract.py" + ), + "timeout_seconds": 21600, + } + ] + if scene_id == "ci_top_attention_cargo_framework": + return [ + { + "id": "top_attention_cargo_framework", + "command": ( + "__RUN_DIR__/venv/bin/python3 -u " + "__RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_cargo_framework.py" + ), + "timeout_seconds": 21600, + } + ] + if scene_id == "ci_top_attention_cargo_fs": + return [ + { + "id": "top_attention_cargo_fs", + "command": ( + "__RUN_DIR__/venv/bin/python3 -u " + "__RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_cargo_fs.py" + ), + "timeout_seconds": 21600, + } + ] + if scene_id == "ci_top_attention_cargo_fs_s3_gateway": + return [ + { + "id": "top_attention_cargo_fs_s3_gateway", + "command": ( + "__RUN_DIR__/venv/bin/python3 -u " + "__RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_cargo_fs_s3_gateway.py" + ), + "timeout_seconds": 21600, + } + ] + if scene_id == "ci_top_attention_cargo_limit_thirdparty": + return [ + { + "id": "top_attention_cargo_limit_thirdparty", + "command": ( + "__RUN_DIR__/venv/bin/python3 -u " + "__RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_cargo_limit_thirdparty.py" + ), + "timeout_seconds": 21600, + } + ] + if scene_id == "ci_top_attention_cargo_mq": + return [ + { + "id": "top_attention_cargo_mq", + "command": ( + "__RUN_DIR__/venv/bin/python3 -u " + "__RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_cargo_mq.py" + ), + "timeout_seconds": 21600, + } + ] + if scene_id == "ci_top_attention_cargo_observability": + return [ + { + "id": "top_attention_cargo_observability", + "command": ( + "__RUN_DIR__/venv/bin/python3 -u " + "__RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_cargo_observability.py" + ), + "timeout_seconds": 21600, + } + ] + if scene_id == "ci_top_attention_cargo_ops": + return [ + { + "id": "top_attention_cargo_ops", + "command": ( + "__RUN_DIR__/venv/bin/python3 -u " + "__RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_cargo_ops.py" + ), + "timeout_seconds": 21600, + } + ] + if scene_id == "ci_top_attention_cargo_pyo3": + return [ + { + "id": "top_attention_cargo_pyo3", + "command": ( + "__RUN_DIR__/venv/bin/python3 -u " + "__RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_cargo_pyo3.py" + ), + "timeout_seconds": 21600, + } + ] if scene_id == "ci_top_attention_log_mgmt": return [ { @@ -7828,6 +8125,23 @@ def _build_resolved_case_yaml( "runtime": selected_runtime, }, } + profile_ts = profile_runtime_src.get("test_stack") + if profile_ts is not None: + profile_ts = _require_dict(profile_ts, "resolved_case.profile_source.runtime.test_stack") + backend_kind = _require_test_stack_backend_kind( + profile_ts.get("kind"), + "resolved_case.profile_source.test_stack.kind", + ) + port_alloc = _resolve_test_stack_port_alloc( + profile_ts.get("port_alloc"), + topology=topology, + backend_kind=backend_kind, + ctx="resolved_case.profile_source.test_stack.port_alloc", + ) + profile["test_stack"] = { + "kind": backend_kind, + "port_alloc": port_alloc, + } if selected_scene_config is not None: profile["ci"]["scene_config"] = selected_scene_config elif case_family == CASE_FAMILY_BENCH: @@ -11010,9 +11324,10 @@ def _ci_cleanup_runtime( resolved_case: Dict[str, Any], *, timeout_s: int, + instance_ids: Optional[List[str]] = None, ) -> None: cleanup_case = _ci_runtime_cleanup_case(resolved_case, ctx="CI cleanup runtime") - for entry in _ci_runtime_current_apply_ids(cleanup_case): + for entry in _ci_runtime_current_apply_ids(cleanup_case, instance_ids=instance_ids): apply_id = _require_str(entry.get("apply_id"), "current_apply_entry.apply_id") instance_ids = _require_list(entry.get("instance_ids"), "current_apply_entry.instance_ids") instance_id_text = ",".join( @@ -11245,7 +11560,7 @@ def _run_adapter_action( run_dir: Path, action: str, ) -> Optional[Dict[str, Any]]: - if action not in ("deploy", "collect", "teardown"): + if action not in ("deploy", "teardown"): raise ValueError(f"invalid adapter action: {action}") deploy = _require_dict(resolved_case.get("deploy"), "resolved_case.deploy") @@ -11278,7 +11593,14 @@ def _run_subprocess(argv: List[str], *, cwd: str) -> None: print("RUN:", " ".join(_shell_quote(a) for a in argv), flush=True) proc = subprocess.run(argv, cwd=cwd) if proc.returncode != 0: - raise RuntimeError(f"command failed: rc={proc.returncode}") + raise RuntimeError( + "command failed: " + f"rc={proc.returncode} cwd={cwd} argv={' '.join(_shell_quote(a) for a in argv)}" + ) + + +def _preserve_success_after_finalize_error(*, case_family: str, outcome: str) -> bool: + return outcome == RUN_OUTCOME_SUCCESS and case_family in (CASE_FAMILY_BENCH, CASE_FAMILY_CI) _SSH_TRANSPORT_TIMEOUT_SECONDS = 180.0 @@ -11776,11 +12098,16 @@ def workload_name_matches(name: str) -> bool: time.sleep(1.0) -def _ci_runtime_current_apply_ids(resolved_case: Dict[str, Any]) -> List[Dict[str, Any]]: +def _ci_runtime_current_apply_ids( + resolved_case: Dict[str, Any], + *, + instance_ids: Optional[List[str]] = None, +) -> List[Dict[str, Any]]: cleanup_case = _ci_runtime_cleanup_case(resolved_case, ctx="CI current runtime apply ids") deploy = _require_dict(cleanup_case.get("deploy"), "resolved_case.deploy") controller_url = _require_str(deploy.get("controller_url"), "deploy.controller_url").rstrip("/") deploy_instances = _require_list(deploy.get("instances"), "resolved_case.deploy.instances") + allowed_instance_ids = None if instance_ids is None else set(instance_ids) workload_to_instance_ids: Dict[Tuple[str, str], List[str]] = {} for raw in deploy_instances: @@ -11788,6 +12115,8 @@ def _ci_runtime_current_apply_ids(resolved_case: Dict[str, Any]) -> List[Dict[st instance_id = _require_str(inst.get("id"), "resolved_case.deploy.instances[].id") if instance_id not in set(_ci_case_instance_ids(cleanup_case)): continue + if allowed_instance_ids is not None and instance_id not in allowed_instance_ids: + continue k8s_ref = _require_str(inst.get("k8s_ref"), f"{instance_id}.k8s_ref") kind, name = _ops_kind_from_k8s_ref(k8s_ref, ctx=f"{instance_id}.k8s_ref") key = (kind, name) @@ -13894,11 +14223,14 @@ def _write_ci_master_owner_configs( owner_dram_bytes: int, ) -> tuple[Path, Path]: owner_work_root = run_dir / "services" / "owner_0" + kv_master_port = _ci_kv_master_port(resolved_case) + if kv_master_port is None: + raise ValueError("CI cluster runtime requires resolved_case.profile.test_stack port_alloc for ci_master") master_cfg = { "etcd_endpoints": ["__ETCD__"], "cluster_name": cluster_name, "instance_key": "ci_master", - "port": 50052, + "port": int(kv_master_port), "monitoring": { "prometheus_base_url": "__PROM_BASE__", "prom_remote_write_url": ["__PROM_WRITE__"], @@ -14768,7 +15100,7 @@ def _print_ci_wait_progress( last_offset: int, next_heartbeat_at: float, deadline: float, -) -> tuple[int, float]: +) -> tuple[int, float, str]: now = time.time() next_offset, chunk = _ci_wait_progress_tail( resolved_case, @@ -14780,7 +15112,7 @@ def _print_ci_wait_progress( if text: sys.stdout.write(_ci_log_prefix_lines(text + "\n", now=now)) sys.stdout.flush() - return next_offset, now + _CI_WAIT_HEARTBEAT_INTERVAL_SECONDS + return next_offset, now + _CI_WAIT_HEARTBEAT_INTERVAL_SECONDS, chunk if now >= next_heartbeat_at: remaining_s = max(0, int(deadline - now)) print( @@ -14789,8 +15121,8 @@ def _print_ci_wait_progress( f"log={str((run_dir / 'logs' / 'ci_runner' / 'stdout.log').resolve())}", flush=True, ) - return next_offset, now + _CI_WAIT_HEARTBEAT_INTERVAL_SECONDS - return next_offset, next_heartbeat_at + return next_offset, now + _CI_WAIT_HEARTBEAT_INTERVAL_SECONDS, "" + return next_offset, next_heartbeat_at, "" def _instance_file_exists( @@ -15149,6 +15481,124 @@ def _wait_instance_exit( time.sleep(2.0) +def _parse_ci_runner_exit_code_text(*, raw: str, path: Path, ctx: str) -> int: + try: + rc = int(raw.strip()) + except ValueError as exc: + raise ValueError(f"{ctx}: path={path} raw={raw!r}") from exc + return _require_int(rc, ctx, min_v=-255) + + +def _parse_ci_runner_stdout_terminal_exit_code( + *, + raw: str, + path: Path, + ctx: str, +) -> Optional[int]: + matches = list(CI_RUNNER_STDOUT_TERMINAL_EXIT_CODE_RE.finditer(raw)) + if not matches: + return None + rc_text = matches[-1].group(1) + return _parse_ci_runner_exit_code_text(raw=rc_text, path=path, ctx=ctx) + + +def _read_ci_runner_stdout_terminal_exit_code_if_present( + *, + resolved_case: Dict[str, Any], + run_dir: Path, + ctx: str, +) -> Optional[int]: + stdout_path = (run_dir / "logs" / "ci_runner" / "stdout.log").resolve() + stdout_raw = _instance_read_text_if_present( + resolved_case, + instance_id="ci_runner", + path=stdout_path, + ) + if stdout_raw is None: + return None + return _parse_ci_runner_stdout_terminal_exit_code( + raw=stdout_raw, + path=stdout_path, + ctx=ctx, + ) + + +def _read_ci_runner_exit_code_if_present( + *, + resolved_case: Dict[str, Any], + run_dir: Path, + baseline_state: Optional[_ObservedFileState], + local_ctx: str, + remote_ctx: str, +) -> Optional[int]: + exit_code_path = (run_dir / "logs" / "ci_runner" / "exit_code.txt").resolve() + current_state = _observe_file_state(exit_code_path) + if _has_new_file_state(before=baseline_state, after=current_state): + raw = exit_code_path.read_text(encoding="utf-8") + if not raw.strip(): + return _read_ci_runner_stdout_terminal_exit_code_if_present( + resolved_case=resolved_case, + run_dir=run_dir, + ctx=local_ctx + ".stdout", + ) + return _parse_ci_runner_exit_code_text( + raw=raw, + path=exit_code_path, + ctx=local_ctx, + ) + remote_raw = _instance_read_text_if_present( + resolved_case, + instance_id="ci_runner", + path=exit_code_path, + ) + if remote_raw is None: + return _read_ci_runner_stdout_terminal_exit_code_if_present( + resolved_case=resolved_case, + run_dir=run_dir, + ctx=remote_ctx + ".stdout", + ) + if not remote_raw.strip(): + return _read_ci_runner_stdout_terminal_exit_code_if_present( + resolved_case=resolved_case, + run_dir=run_dir, + ctx=remote_ctx + ".stdout", + ) + return _parse_ci_runner_exit_code_text( + raw=remote_raw, + path=exit_code_path, + ctx=remote_ctx, + ) + + +def _wait_ci_runner_exit_code_file_after_terminal_status( + *, + resolved_case: Dict[str, Any], + run_dir: Path, + baseline_state: Optional[_ObservedFileState], + status_exit_code: int, +) -> Optional[int]: + deadline = time.time() + float(CI_RUNNER_TERMINAL_EXIT_CODE_FILE_GRACE_S) + while True: + rc = _read_ci_runner_exit_code_if_present( + resolved_case=resolved_case, + run_dir=run_dir, + baseline_state=baseline_state, + local_ctx="ci_runner.exit_code", + remote_ctx="ci_runner.remote_exit_code", + ) + if rc is not None: + if rc != int(status_exit_code): + print( + "[CI wait exit_code] controller reported terminal process exit before exit_code.txt " + f"became readable; preferring exit_code.txt rc={rc} controller_exit_code={status_exit_code}", + flush=True, + ) + return rc + if time.time() >= deadline: + return None + time.sleep(0.5) + + def _wait_ci_runner_exit_code_resume( *, resolved_case: Dict[str, Any], @@ -15172,19 +15622,15 @@ def _wait_ci_runner_exit_code_resume( deadline = time.time() + float(timeout_s) last_status_err: str | None = None while True: - raw = _instance_read_text_if_present( - resolved_case, - instance_id="ci_runner", - path=exit_code_path, + rc_from_file = _read_ci_runner_exit_code_if_present( + resolved_case=resolved_case, + run_dir=run_dir, + baseline_state=None, + local_ctx="ci_runner.resume_exit_code", + remote_ctx="ci_runner.resume_exit_code", ) - if raw is not None: - try: - rc = int(raw.strip()) - except ValueError as exc: - raise ValueError( - f"ci_runner remote exit_code file is not an int: path={exit_code_path} raw={raw!r}" - ) from exc - return _require_int(rc, "ci_runner.resume_exit_code", min_v=-255) + if rc_from_file is not None: + return rc_from_file try: status = _instance_status(resolved_case, instance_id="ci_runner") @@ -15201,7 +15647,16 @@ def _wait_ci_runner_exit_code_resume( continue status_exit_code = status.get("exit_code") if status.get("ok") is True and status.get("running") is False and isinstance(status_exit_code, int): - return _require_int(status_exit_code, "ci_runner.resume.status.exit_code", min_v=-255) + status_exit_code_i = _require_int(status_exit_code, "ci_runner.resume.status.exit_code", min_v=-255) + rc_after_grace = _wait_ci_runner_exit_code_file_after_terminal_status( + resolved_case=resolved_case, + run_dir=run_dir, + baseline_state=None, + status_exit_code=status_exit_code_i, + ) + if rc_after_grace is not None: + return rc_after_grace + return status_exit_code_i if status.get("ok") is True and status.get("running") is False: # Deterministic behavior: # - If controller no longer reports desired workloads for this case, the CI runner cannot start. @@ -15245,38 +15700,35 @@ def _wait_ci_runner_exit_code( last_status_err: str | None = None log_offset = 0 next_heartbeat_at = 0.0 + stdout_terminal_tail = "" + stdout_path = (run_dir / "logs" / "ci_runner" / "stdout.log").resolve() while True: - log_offset, next_heartbeat_at = _print_ci_wait_progress( + log_offset, next_heartbeat_at, stdout_chunk = _print_ci_wait_progress( resolved_case, run_dir=run_dir, last_offset=log_offset, next_heartbeat_at=next_heartbeat_at, deadline=deadline, ) + if stdout_chunk: + stdout_terminal_tail = (stdout_terminal_tail + stdout_chunk)[-4096:] + rc_from_stdout_tail = _parse_ci_runner_stdout_terminal_exit_code( + raw=stdout_terminal_tail, + path=stdout_path, + ctx="ci_runner.stdout_progress", + ) + if rc_from_stdout_tail is not None: + return rc_from_stdout_tail current_state = _observe_file_state(exit_code_path) - if _has_new_file_state(before=baseline_state, after=current_state): - raw = exit_code_path.read_text(encoding="utf-8").strip() - try: - rc = int(raw) - except ValueError as exc: - raise ValueError( - f"ci_runner exit_code file is not an int: path={exit_code_path} raw={raw!r}" - ) from exc - return _require_int(rc, "ci_runner.exit_code", min_v=-255) - remote_raw = _instance_read_text_if_present( - resolved_case, - instance_id="ci_runner", - path=exit_code_path, + rc_from_file = _read_ci_runner_exit_code_if_present( + resolved_case=resolved_case, + run_dir=run_dir, + baseline_state=baseline_state, + local_ctx="ci_runner.exit_code", + remote_ctx="ci_runner.remote_exit_code", ) - if remote_raw is not None: - raw = remote_raw.strip() - try: - rc = int(raw) - except ValueError as exc: - raise ValueError( - f"ci_runner remote exit_code file is not an int: path={exit_code_path} raw={raw!r}" - ) from exc - return _require_int(rc, "ci_runner.remote_exit_code", min_v=-255) + if rc_from_file is not None: + return rc_from_file try: status = _instance_status(resolved_case, instance_id="ci_runner") except _HttpGetJsonTransientError as exc: @@ -15290,7 +15742,16 @@ def _wait_ci_runner_exit_code( continue status_exit_code = status.get("exit_code") if status.get("ok") is True and status.get("running") is False and isinstance(status_exit_code, int): - return _require_int(status_exit_code, "ci_runner.status.exit_code", min_v=-255) + status_exit_code_i = _require_int(status_exit_code, "ci_runner.status.exit_code", min_v=-255) + rc_after_grace = _wait_ci_runner_exit_code_file_after_terminal_status( + resolved_case=resolved_case, + run_dir=run_dir, + baseline_state=baseline_state, + status_exit_code=status_exit_code_i, + ) + if rc_after_grace is not None: + return rc_after_grace + return status_exit_code_i if status.get("ok") is True and status.get("running") is False: # Deterministic behavior: # - If controller no longer reports desired workloads for this case, the CI runner cannot start. diff --git a/fluxon_test_stack/test_runner_ci_runtime.py b/fluxon_test_stack/test_runner_ci_runtime.py index bef19e2..281843f 100644 --- a/fluxon_test_stack/test_runner_ci_runtime.py +++ b/fluxon_test_stack/test_runner_ci_runtime.py @@ -11,12 +11,45 @@ def _ci_runtime_python_executable() -> str: - python_bin = shutil.which(_CI_RUNTIME_PYTHON_BIN_NAME) - if python_bin is None: + candidates = [] + seen: set[str] = set() + for raw_candidate in ( + _CI_RUNTIME_PYTHON_BIN_NAME, + "python3", + "python", + ): + resolved = shutil.which(raw_candidate) + if resolved is None or resolved in seen: + continue + seen.add(resolved) + candidates.append(resolved) + if not candidates: raise ValueError( - "CI runtime requires python3.10 on PATH to create the offline-wheelhouse venv" + "CI runtime requires a Python 3.10 interpreter on PATH to create the offline-wheelhouse venv" ) - return python_bin + for python_bin in candidates: + if _python_executable_abi(python_bin) == _TEST_STACK_DEFAULT_PYTHON_ABI: + return python_bin + raise ValueError( + "CI runtime requires a Python 3.10 interpreter on PATH to create the offline-wheelhouse venv" + ) + + +def _python_executable_abi(python_bin: str) -> str: + try: + return subprocess.check_output( + [ + python_bin, + "-c", + ( + "import sys; " + "print(f'{sys.implementation.name}{sys.version_info[0]}.{sys.version_info[1]}')" + ), + ], + text=True, + ).strip() + except (OSError, subprocess.CalledProcessError) as exc: + raise ValueError(f"failed to probe python ABI for executable: {python_bin}") from exc def _ci_runtime_python_abi( @@ -67,9 +100,13 @@ def _create_ci_runtime_venv( if venv_dir.exists(): raise ValueError(f"venv dir already exists (no overwrite): {venv_dir}") python_bin = _ci_runtime_python_executable() - run_subprocess([python_bin, "-m", "venv", str(venv_dir)]) + # Skip venv's implicit ensurepip step, then seed pip explicitly so the venv stays + # self-contained and does not depend on host site-packages. + run_subprocess([python_bin, "-m", "venv", "--without-pip", str(venv_dir)]) venv_python = venv_dir / "bin" / "python3" if not venv_python.exists(): raise ValueError(f"venv python not found after creation: {venv_python}") + run_subprocess([str(venv_python), "-m", "ensurepip", "--upgrade", "--default-pip"]) + run_subprocess([str(venv_python), "-m", "pip", "--version"]) assert_python_abi(venv_python) return venv_python diff --git a/fluxon_test_stack/test_runner_models.py b/fluxon_test_stack/test_runner_models.py index cb38467..dcb3a5c 100644 --- a/fluxon_test_stack/test_runner_models.py +++ b/fluxon_test_stack/test_runner_models.py @@ -85,7 +85,6 @@ class _CasePlan: case_family: str prepare_phases: Tuple[_RuntimePhase, ...] execute_phases: Tuple[_RuntimePhase, ...] - collect_phases: Tuple[_RuntimePhase, ...] @dataclass(frozen=True) diff --git a/fluxon_test_stack/test_runner_runtime_backend.py b/fluxon_test_stack/test_runner_runtime_backend.py index 14a85e4..0dce31c 100644 --- a/fluxon_test_stack/test_runner_runtime_backend.py +++ b/fluxon_test_stack/test_runner_runtime_backend.py @@ -378,13 +378,12 @@ def _execute_ci_case( ), ) outcome = ctx.RUN_OUTCOME_SUCCESS if rc == 0 else ctx.RUN_OUTCOME_FAILED - if outcome == ctx.RUN_OUTCOME_SUCCESS and runtime_tracking.ci_apply_ids.get("ci_runner") is not None: - ctx._delete_apply_id( - resolved_case, - apply_id=ctx._require_str(runtime_tracking.ci_apply_ids.get("ci_runner"), "CI ci_runner apply_id"), - ctx="CI ci_runner apply", + if outcome == ctx.RUN_OUTCOME_SUCCESS: + _finalize_terminal_ci_runner_success( + ctx=ctx, + resolved_case=resolved_case, + runtime_tracking=runtime_tracking, ) - del runtime_tracking.ci_apply_ids["ci_runner"] summary = ctx._build_ci_summary_yaml( resolved_case, run_index=run_index, @@ -394,11 +393,32 @@ def _execute_ci_case( counted=False, ci_out={"rc": rc}, ) - for phase in prepared_case.plan.collect_phases: - ctx._collect_runtime_phase(resolved_case, run_dir=run_dir, phase=phase) return ctx._ExecutedCase(outcome=outcome, summary=summary) +def _finalize_terminal_ci_runner_success( + *, + ctx: Any, + resolved_case: Dict[str, Any], + runtime_tracking: Any, +) -> None: + apply_id = runtime_tracking.ci_apply_ids.pop("ci_runner", None) + if apply_id is None: + return + try: + ctx._delete_apply_id( + resolved_case, + apply_id=ctx._require_str(apply_id, "CI ci_runner apply_id"), + ctx="CI ci_runner terminal success apply", + ) + except Exception as exc: # noqa: BLE001 + print( + "WARN: CI ci_runner terminal success cleanup failed; " + f"preserving terminal test result and excluding ci_runner from finalize tracking: {type(exc).__name__}: {exc}", + flush=True, + ) + + def _execute_test_stack_case( *, ctx: Any, @@ -414,7 +434,6 @@ def _execute_test_stack_case( outcome = ctx.RUN_OUTCOME_FAILED error_detail: Optional[str] = None - collect_error_detail: Optional[str] = None result_obj: Optional[Dict[str, Any]] = None try: @@ -445,12 +464,6 @@ def _execute_test_stack_case( outcome = ctx.RUN_OUTCOME_SUCCESS except Exception as exc: # noqa: BLE001 error_detail = f"{type(exc).__name__}: {exc}" - finally: - try: - for phase in prepared_case.plan.collect_phases: - ctx._collect_runtime_phase(resolved_case, run_dir=run_dir, phase=phase) - except Exception as exc: # noqa: BLE001 - collect_error_detail = f"{type(exc).__name__}: {exc}" summary = { "schema_version": ctx.SCHEMA_VERSION, @@ -472,7 +485,7 @@ def _execute_test_stack_case( "result_path": str(_require_test_stack_result_path(prepared_case.test_stack_result_path)), "result": result_obj, "error": error_detail, - "collect_error": collect_error_detail, + "collect_error": None, }, } return ctx._ExecutedCase(outcome=outcome, summary=summary) @@ -543,6 +556,7 @@ def _finalize_case_runtime( _finalize_test_stack_case_runtime( ctx=ctx, resolved_case=resolved_case, + run_dir=run_dir, runtime_tracking=runtime_tracking, outcome=outcome, ) @@ -579,6 +593,7 @@ def _finalize_ci_case_runtime( should_teardown = outcome == ctx.RUN_OUTCOME_SUCCESS or run_mode == ctx.RUN_MODE_FULL_ONCE if should_teardown: (run_dir / ctx.CI_PRESERVED_APPLY_IDS_FILENAME).unlink(missing_ok=True) + cleanup_instance_ids: list[str] = [] for entry in reversed(tracked_apply_entries): apply_id = ctx._require_str(entry.get("apply_id"), "ci tracked apply entry.apply_id") instance_ids = ctx._require_list(entry.get("instance_ids"), "ci tracked apply entry.instance_ids") @@ -586,12 +601,16 @@ def _finalize_ci_case_runtime( ctx._require_str(raw_instance_id, "ci tracked apply entry.instance_ids[]") for raw_instance_id in instance_ids ) + cleanup_instance_ids.extend( + ctx._require_str(raw_instance_id, "ci tracked apply entry.instance_ids[]") + for raw_instance_id in instance_ids + ) ctx._delete_apply_id( resolved_case, apply_id=apply_id, ctx=f"CI {instance_id_text} apply", ) - ctx._ci_cleanup_runtime(resolved_case, timeout_s=120) + ctx._ci_cleanup_runtime(resolved_case, timeout_s=120, instance_ids=cleanup_instance_ids) return if not ci_preserved_apply_ids: return @@ -617,11 +636,25 @@ def _finalize_test_stack_case_runtime( *, ctx: Any, resolved_case: Dict[str, Any], + run_dir: Path, runtime_tracking: Any, outcome: str, ) -> None: case = ctx._require_dict(resolved_case.get("case"), "resolved_case.case") run_mode = ctx._require_str(case.get("run_mode"), "resolved_case.case.run_mode") + collect_error_detail: Optional[str] = None + + try: + # Collect first so failed runs still retain instance status snapshots before teardown. + ctx._run_adapter_action(resolved_case, run_dir=run_dir, action="collect") + except Exception as exc: # noqa: BLE001 + collect_error_detail = f"{type(exc).__name__}: {exc}" + summary_path = (run_dir / "summary.yaml").resolve() + summary = ctx._require_dict(ctx._load_yaml_file(summary_path), "summary.yaml") + test_stack_summary = ctx._require_dict(summary.get("test_stack"), "summary.yaml.test_stack") + test_stack_summary["collect_error"] = collect_error_detail + ctx._write_yaml_file(summary_path, summary) + ts_preserved_apply_ids: list[str] = [] if runtime_tracking.ts_nodes_deploy_attempted and runtime_tracking.ts_nodes_apply_id is not None: ts_preserved_apply_ids.append( diff --git a/fluxon_test_stack/tests/test_ci_2_virt_node_contract.py b/fluxon_test_stack/tests/test_ci_2_virt_node_contract.py index 6ebbecd..b861806 100644 --- a/fluxon_test_stack/tests/test_ci_2_virt_node_contract.py +++ b/fluxon_test_stack/tests/test_ci_2_virt_node_contract.py @@ -28,6 +28,18 @@ def _load_module(): class TestCi2VirtNodeContract(unittest.TestCase): _KVTEST_SCENE_ID = "ci_top_attention_bin_kvtest" + _CARGO_KV_UNIT_SCENE_ID = "ci_top_attention_cargo_kv_unit" + _CARGO_CLI_SCENE_ID = "ci_top_attention_cargo_cli" + _CARGO_COMMU_SCENE_ID = "ci_top_attention_cargo_commu" + _CARGO_COMMU_CONTRACT_SCENE_ID = "ci_top_attention_cargo_commu_contract" + _CARGO_FRAMEWORK_SCENE_ID = "ci_top_attention_cargo_framework" + _CARGO_FS_SCENE_ID = "ci_top_attention_cargo_fs" + _CARGO_FS_S3_GATEWAY_SCENE_ID = "ci_top_attention_cargo_fs_s3_gateway" + _CARGO_LIMIT_THIRDPARTY_SCENE_ID = "ci_top_attention_cargo_limit_thirdparty" + _CARGO_MQ_SCENE_ID = "ci_top_attention_cargo_mq" + _CARGO_OBSERVABILITY_SCENE_ID = "ci_top_attention_cargo_observability" + _CARGO_OPS_SCENE_ID = "ci_top_attention_cargo_ops" + _CARGO_PYO3_SCENE_ID = "ci_top_attention_cargo_pyo3" _DOC_SCENE_ID = "ci_top_attention_doc_page_build" _LOG_MGMT_SCENE_ID = "ci_top_attention_log_mgmt" _MQ_SCENE_ID = "ci_top_attention_mq_core" @@ -193,6 +205,62 @@ def test_generated_suite_preserves_source_scene_configs(self) -> None: "p2p_only", ) + def test_generated_suite_injects_public_transport_feature_for_cargo_kv_unit(self) -> None: + suite_cfg = _ENTRY._load_yaml_mapping(_ENTRY.DEFAULT_SUITE_PATH, ctx="suite") + generated = _ENTRY._rewrite_suite_for_local_dual_nodes( + suite_cfg=suite_cfg, + scene_ids=[self._CARGO_KV_UNIT_SCENE_ID], + primary_node_name="local-node-a", + secondary_node_name="local-node-b", + host_ip="10.1.1.119", + wheel_name="fluxon-0.2.1-cp38-abi3-manylinux_2_28_x86_64.whl", + controller_port=19080, + ) + + self.assertEqual( + generated["profiles"]["fluxon_tcp_thread"]["runtime"]["ci"]["scene_configs"][self._CARGO_KV_UNIT_SCENE_ID][ + "kv_transport_feature" + ], + "tcp_thread_transport", + ) + + def test_generated_suite_supports_additional_runner_native_cargo_scenes(self) -> None: + scene_ids = [ + self._CARGO_CLI_SCENE_ID, + self._CARGO_COMMU_SCENE_ID, + self._CARGO_COMMU_CONTRACT_SCENE_ID, + self._CARGO_FRAMEWORK_SCENE_ID, + self._CARGO_FS_SCENE_ID, + self._CARGO_FS_S3_GATEWAY_SCENE_ID, + self._CARGO_LIMIT_THIRDPARTY_SCENE_ID, + self._CARGO_MQ_SCENE_ID, + self._CARGO_OBSERVABILITY_SCENE_ID, + self._CARGO_OPS_SCENE_ID, + self._CARGO_PYO3_SCENE_ID, + ] + suite_cfg = _ENTRY._load_yaml_mapping(_ENTRY.DEFAULT_SUITE_PATH, ctx="suite") + generated = _ENTRY._rewrite_suite_for_local_dual_nodes( + suite_cfg=suite_cfg, + scene_ids=scene_ids, + primary_node_name="local-node-a", + secondary_node_name="local-node-b", + host_ip="10.1.1.119", + wheel_name="fluxon-0.2.1-cp38-abi3-manylinux_2_28_x86_64.whl", + controller_port=19080, + ) + + self.assertEqual(set(generated["scenes"].keys()), set(scene_ids)) + for scene_id in scene_ids: + self.assertEqual( + generated["scenes"][scene_id]["ci"]["runtime_contract"], + "rust_self_managed", + ) + self.assertEqual( + generated["scenes"][scene_id]["ci"]["subject"], + "rust", + ) + self.assertNotIn("commands", generated["scenes"][scene_id]["ci"]) + def test_generated_suite_supports_doc_page_ci_scene(self) -> None: suite_cfg = _ENTRY._load_yaml_mapping(_ENTRY.DEFAULT_SUITE_PATH, ctx="suite") generated = _ENTRY._rewrite_suite_for_local_dual_nodes( diff --git a/fluxon_test_stack/tests/test_pack_test_stack_rsc_cli.py b/fluxon_test_stack/tests/test_pack_test_stack_rsc_cli.py index d87b3fa..d3b970a 100644 --- a/fluxon_test_stack/tests/test_pack_test_stack_rsc_cli.py +++ b/fluxon_test_stack/tests/test_pack_test_stack_rsc_cli.py @@ -27,6 +27,141 @@ def _load_module(): class TestPackTestStackRscCli(unittest.TestCase): + def test_download_python_runtime_wheels_uses_matching_python_abi_interpreter(self) -> None: + with tempfile.TemporaryDirectory() as tmpdir: + out_dir = Path(tmpdir) / "wheelhouse" + out_dir.mkdir(parents=True, exist_ok=True) + expected_specs = ( + {"name": "pytest", "version": "8.3.5", "source": "wheel"}, + {"name": "etcd3", "version": "0.12.0", "source": "sdist"}, + ) + + def fake_check_call(argv, cwd=None): + self.assertEqual(cwd, str(REPO_ROOT)) + if argv[2:4] == ["pip", "download"]: + self.assertEqual(argv[0], "/usr/bin/python3.10") + self.assertIn("--python-version", argv) + self.assertIn("3.10", argv) + (out_dir / "pytest-8.3.5-py3-none-any.whl").write_text("wheel\n", encoding="utf-8") + return 0 + if argv[2:4] == ["pip", "wheel"]: + self.assertEqual(argv[0], "/usr/bin/python3.10") + (out_dir / "etcd3-0.12.0-py3-none-any.whl").write_text("wheel\n", encoding="utf-8") + return 0 + raise AssertionError(f"unexpected argv: {argv}") + + with ( + mock.patch.object( + _PACK, + "_python_executable_for_python_abi", + return_value="/usr/bin/python3.10", + ) as python_exe_mock, + mock.patch.object(_PACK, "_wheelhouse_resolves_offline", return_value=True) as resolve_mock, + mock.patch.object(_PACK.subprocess, "check_call", side_effect=fake_check_call) as check_call_mock, + ): + _PACK._download_python_runtime_wheels( + out_dir=out_dir, + python_abi="cpython3.10", + platform_tag="manylinux2014_x86_64", + expected_specs=expected_specs, + ) + + python_exe_mock.assert_called_once_with(python_abi="cpython3.10") + resolve_mock.assert_called_once_with( + wheelhouse_root=out_dir, + python_abi="cpython3.10", + expected_specs=expected_specs, + ) + self.assertEqual(check_call_mock.call_count, 2) + + def test_python_executable_for_python_abi_requires_matching_interpreter(self) -> None: + with mock.patch.object( + _PACK.shutil, + "which", + side_effect=lambda name: { + "python3.10": None, + "python3": "/usr/bin/python3", + "python": "/usr/bin/python", + }.get(name), + ): + with mock.patch.object( + _PACK, + "_python_executable_abi", + side_effect=lambda path: { + "/usr/bin/python3": "cpython3.12", + "/usr/bin/python": "cpython3.12", + }[path], + ): + with self.assertRaisesRegex( + RuntimeError, + "requires a matching Python interpreter on PATH", + ): + _PACK._python_executable_for_python_abi(python_abi="cpython3.10") + + def test_wheelhouse_resolves_offline_uses_matching_python_abi_interpreter(self) -> None: + with tempfile.TemporaryDirectory() as tmpdir: + wheelhouse_root = Path(tmpdir) / "wheelhouse" + wheelhouse_root.mkdir(parents=True, exist_ok=True) + expected_specs = ({"name": "pytest", "version": "8.3.5", "source": "wheel"},) + + with ( + mock.patch.object( + _PACK, + "_python_executable_for_python_abi", + return_value="/usr/bin/python3.10", + ) as python_exe_mock, + mock.patch.object(_PACK.subprocess, "check_call", return_value=0) as check_call_mock, + ): + ok = _PACK._wheelhouse_resolves_offline( + wheelhouse_root=wheelhouse_root, + python_abi="cpython3.10", + expected_specs=expected_specs, + ) + + self.assertTrue(ok) + python_exe_mock.assert_called_once_with(python_abi="cpython3.10") + argv = check_call_mock.call_args.args[0] + self.assertEqual(argv[0], "/usr/bin/python3.10") + self.assertEqual(argv[1:5], ["-m", "pip", "download", "--no-index"]) + self.assertIn(str(wheelhouse_root), argv) + + def test_prepare_python_runtime_wheelhouse_rebuilds_when_existing_wheelhouse_fails_offline_resolution(self) -> None: + with tempfile.TemporaryDirectory() as tmpdir: + prepared_root = Path(tmpdir) / "prepared" + scratch_root = Path(tmpdir) / "scratch" + wheelhouse_root = prepared_root / "python_runtime" / "cpython3.10" / "wheels" + wheelhouse_root.mkdir(parents=True, exist_ok=True) + (wheelhouse_root / "pytest-8.3.5-py3-none-any.whl").write_text("old\n", encoding="utf-8") + cfg = { + "dependency_sets": { + "base": { + "requirements": [ + {"pinned": "pytest==8.3.5", "source": "wheel"}, + ] + } + } + } + + def fake_download(*, out_dir, python_abi, platform_tag, expected_specs): + self.assertEqual(python_abi, "cpython3.10") + self.assertEqual(platform_tag, "manylinux2014_x86_64") + self.assertEqual(expected_specs, ({"name": "pytest", "version": "8.3.5", "source": "wheel"},)) + (out_dir / "pytest-8.3.5-py3-none-any.whl").write_text("new\n", encoding="utf-8") + + with ( + mock.patch.object(_PACK, "_wheelhouse_resolves_offline", side_effect=[False, True]) as resolve_mock, + mock.patch.object(_PACK, "_download_python_runtime_wheels", side_effect=fake_download) as download_mock, + ): + _PACK._prepare_python_runtime_wheelhouse_into_root( + prepared_root=prepared_root, + scratch_root=scratch_root, + python_runtime_cfg=cfg, + ) + + self.assertEqual(resolve_mock.call_count, 1) + download_mock.assert_called_once() + self.assertEqual((wheelhouse_root / "pytest-8.3.5-py3-none-any.whl").read_text(encoding="utf-8"), "new\n") + def test_resolve_transport_backends_from_ci_suite(self) -> None: backends = _PACK._resolve_transport_backends( config_path=(REPO_ROOT / "fluxon_test_stack" / "ci_test_list.yaml").resolve(), diff --git a/fluxon_test_stack/tests/test_runner_contract.py b/fluxon_test_stack/tests/test_runner_contract.py index f2e5a64..f5901bb 100644 --- a/fluxon_test_stack/tests/test_runner_contract.py +++ b/fluxon_test_stack/tests/test_runner_contract.py @@ -59,6 +59,14 @@ def _build_checks(selected_test_id: Optional[str]) -> List[Tuple[str, Callable[[ "ci_top_attention_doc_page_build_uses_online_docker_image", test_ci_top_attention_doc_page_build_uses_online_docker_image, ), + ( + "ci_top_attention_cargo_kv_unit_uses_rust_self_managed_runtime", + test_ci_top_attention_cargo_kv_unit_uses_rust_self_managed_runtime, + ), + ( + "ci_top_attention_additional_cargo_scenes_exist", + test_ci_top_attention_additional_cargo_scenes_exist, + ), ( "ci_top_attention_log_mgmt_scene_exists", test_ci_top_attention_log_mgmt_scene_exists, @@ -285,6 +293,65 @@ def test_ci_top_attention_log_mgmt_scene_exists() -> None: return print("PASS: test_ci_top_attention_log_mgmt_scene_exists") + +def test_ci_top_attention_additional_cargo_scenes_exist() -> None: + repo_root = Path(__file__).resolve().parents[2] + suite_cfg_path = repo_root / "fluxon_test_stack" / "ci_test_list.yaml" + suite_cfg = yaml.safe_load(suite_cfg_path.read_text(encoding="utf-8")) + if not isinstance(suite_cfg, dict): + print("FAIL: test_ci_top_attention_additional_cargo_scenes_exist - suite config is not a mapping") + return + + suite = _TEST_RUNNER._parse_suite_config(copy.deepcopy(suite_cfg)) + expected_scene_ids = { + "ci_top_attention_cargo_cli", + "ci_top_attention_cargo_commu", + "ci_top_attention_cargo_commu_contract", + "ci_top_attention_cargo_framework", + "ci_top_attention_cargo_fs", + "ci_top_attention_cargo_fs_s3_gateway", + "ci_top_attention_cargo_limit_thirdparty", + "ci_top_attention_cargo_mq", + "ci_top_attention_cargo_observability", + "ci_top_attention_cargo_ops", + "ci_top_attention_cargo_pyo3", + } + missing = sorted(scene_id for scene_id in expected_scene_ids if scene_id not in suite.scenes) + if missing: + print( + "FAIL: test_ci_top_attention_additional_cargo_scenes_exist - " + f"missing scenes: {missing!r}" + ) + return + for scene_id in sorted(expected_scene_ids): + scene = suite.scenes.get(scene_id) + if not isinstance(scene, dict): + print( + "FAIL: test_ci_top_attention_additional_cargo_scenes_exist - " + f"scene is not a mapping: {scene_id!r}" + ) + return + ci = scene.get("ci") + if not isinstance(ci, dict): + print( + "FAIL: test_ci_top_attention_additional_cargo_scenes_exist - " + f"scene.ci missing: {scene_id!r}" + ) + return + if ci.get("subject") != "rust": + print( + "FAIL: test_ci_top_attention_additional_cargo_scenes_exist - " + f"expected subject 'rust' for {scene_id!r}, got {ci.get('subject')!r}" + ) + return + if ci.get("runtime_contract") != "rust_self_managed": + print( + "FAIL: test_ci_top_attention_additional_cargo_scenes_exist - " + f"expected runtime_contract 'rust_self_managed' for {scene_id!r}, got {ci.get('runtime_contract')!r}" + ) + return + print("PASS: test_ci_top_attention_additional_cargo_scenes_exist") + def test_ci_top_attention_mq_core_uses_cluster_kv_owner_runtime() -> None: repo_root = Path(__file__).resolve().parents[2] suite_cfg_path = repo_root / "fluxon_test_stack" / "ci_test_list.yaml" @@ -339,5 +406,119 @@ def test_ci_top_attention_mq_core_uses_cluster_kv_owner_runtime() -> None: print("PASS: test_ci_top_attention_mq_core_uses_cluster_kv_owner_runtime") +def test_ci_top_attention_cargo_kv_unit_uses_rust_self_managed_runtime() -> None: + repo_root = Path(__file__).resolve().parents[2] + suite_cfg_path = repo_root / "fluxon_test_stack" / "ci_test_list.yaml" + suite_cfg = yaml.safe_load(suite_cfg_path.read_text(encoding="utf-8")) + if not isinstance(suite_cfg, dict): + print("FAIL: test_ci_top_attention_cargo_kv_unit_uses_rust_self_managed_runtime - suite config is not a mapping") + return + + suite_for_contract = copy.deepcopy(suite_cfg) + suite = _TEST_RUNNER._parse_suite_config(suite_for_contract) + cases = _TEST_RUNNER._expand_cases(suite) + case = next( + ( + item + for item in cases + if item.scene_id == "ci_top_attention_cargo_kv_unit" + and item.profile_id == "fluxon_tcp" + ), + None, + ) + if case is None: + print("FAIL: test_ci_top_attention_cargo_kv_unit_uses_rust_self_managed_runtime - missing cargo kv unit case") + return + planned = _TEST_RUNNER._build_ci_execution_plan(case, suite) + if len(planned) != 1: + print( + "FAIL: test_ci_top_attention_cargo_kv_unit_uses_rust_self_managed_runtime - " + f"expected one planned case, got {len(planned)}" + ) + return + commands = planned[0].ci_commands + if len(commands) != 1: + print( + "FAIL: test_ci_top_attention_cargo_kv_unit_uses_rust_self_managed_runtime - " + f"expected one command, got {len(commands)}" + ) + return + command = commands[0] + if command.get("id") != "top_attention_cargo_kv_unit": + print( + "FAIL: test_ci_top_attention_cargo_kv_unit_uses_rust_self_managed_runtime - " + f"unexpected command id: {command.get('id')!r}" + ) + return + scene = suite.scenes.get("ci_top_attention_cargo_kv_unit") + if not isinstance(scene, dict): + print( + "FAIL: test_ci_top_attention_cargo_kv_unit_uses_rust_self_managed_runtime - " + "missing cargo kv unit scene" + ) + return + ci = scene.get("ci") + if not isinstance(ci, dict): + print( + "FAIL: test_ci_top_attention_cargo_kv_unit_uses_rust_self_managed_runtime - " + "scene.ci missing" + ) + return + if ci.get("subject") != "rust": + print( + "FAIL: test_ci_top_attention_cargo_kv_unit_uses_rust_self_managed_runtime - " + f"expected subject 'rust', got {ci.get('subject')!r}" + ) + return + if ci.get("runtime_contract") != "rust_self_managed": + print( + "FAIL: test_ci_top_attention_cargo_kv_unit_uses_rust_self_managed_runtime - " + f"expected runtime_contract 'rust_self_managed', got {ci.get('runtime_contract')!r}" + ) + return + profile = suite.profiles.get("fluxon_tcp") + if not isinstance(profile, dict): + print( + "FAIL: test_ci_top_attention_cargo_kv_unit_uses_rust_self_managed_runtime - " + "missing fluxon_tcp profile" + ) + return + runtime = profile.get("runtime") + if not isinstance(runtime, dict): + print( + "FAIL: test_ci_top_attention_cargo_kv_unit_uses_rust_self_managed_runtime - " + "profile.runtime missing" + ) + return + profile_ci = runtime.get("ci") + if not isinstance(profile_ci, dict): + print( + "FAIL: test_ci_top_attention_cargo_kv_unit_uses_rust_self_managed_runtime - " + "profile.runtime.ci missing" + ) + return + scene_configs = profile_ci.get("scene_configs") + if not isinstance(scene_configs, dict): + print( + "FAIL: test_ci_top_attention_cargo_kv_unit_uses_rust_self_managed_runtime - " + "profile.runtime.ci.scene_configs missing" + ) + return + cargo_scene_config = scene_configs.get("ci_top_attention_cargo_kv_unit") + if not isinstance(cargo_scene_config, dict): + print( + "FAIL: test_ci_top_attention_cargo_kv_unit_uses_rust_self_managed_runtime - " + "missing ci_top_attention_cargo_kv_unit scene config" + ) + return + if cargo_scene_config.get("kv_transport_feature") != "tcp_thread_transport": + print( + "FAIL: test_ci_top_attention_cargo_kv_unit_uses_rust_self_managed_runtime - " + f"unexpected kv_transport_feature: {cargo_scene_config.get('kv_transport_feature')!r}" + ) + return + print("PASS: test_ci_top_attention_cargo_kv_unit_uses_rust_self_managed_runtime") + + if __name__ == "__main__": raise SystemExit(main()) diff --git a/fluxon_test_stack/tests/test_test_profile_adapter_contract.py b/fluxon_test_stack/tests/test_test_profile_adapter_contract.py new file mode 100644 index 0000000..5fda610 --- /dev/null +++ b/fluxon_test_stack/tests/test_test_profile_adapter_contract.py @@ -0,0 +1,91 @@ +#!/usr/bin/env python3 + +from __future__ import annotations + +import importlib.util +import sys +import tempfile +import unittest +from pathlib import Path +from unittest import mock + +import yaml + + +REPO_ROOT = Path(__file__).resolve().parents[2] +MODULE_PATH = REPO_ROOT / "fluxon_test_stack" / "test_profile_adapter.py" + + +def _load_module(): + module_dir = MODULE_PATH.parent + sys.path.insert(0, str(module_dir)) + try: + spec = importlib.util.spec_from_file_location("fluxon_test_stack_test_profile_adapter_contract", MODULE_PATH) + assert spec is not None and spec.loader is not None + mod = importlib.util.module_from_spec(spec) + sys.modules[spec.name] = mod + spec.loader.exec_module(mod) + return mod + finally: + if sys.path and sys.path[0] == str(module_dir): + sys.path.pop(0) + + +_ADAPTER = _load_module() + + +class TestTestProfileAdapterContract(unittest.TestCase): + def test_action_collect_writes_per_instance_status_snapshots(self) -> None: + with tempfile.TemporaryDirectory() as td: + run_dir = Path(td) + instances = [ + _ADAPTER._InstanceReq( + id="coordinator", + k8s_ref="deployment/coord", + workload_kind="Deployment", + workload_name="coord", + authority="coord", + target="local-node-a", + controller_target="controller-a", + node_ip="127.0.0.1", + lifecycle="service", + endpoint_scheme=None, + host_port=None, + payload_file_rel=None, + payload_file_abs=None, + payload_dest_path=None, + ), + _ADAPTER._InstanceReq( + id="node_0", + k8s_ref="deployment/node", + workload_kind="Deployment", + workload_name="node", + authority="node", + target="local-node-b", + controller_target="controller-b", + node_ip="127.0.0.2", + lifecycle="job", + endpoint_scheme=None, + host_port=None, + payload_file_rel=None, + payload_file_abs=None, + payload_dest_path=None, + ), + ] + statuses = [ + (200, {"ok": True, "instance_id": "coordinator"}), + (503, {"ok": False, "instance_id": "node_0"}), + ] + + with mock.patch.object(_ADAPTER, "_http_status_allow_error", side_effect=statuses) as status_mock: + _ADAPTER._action_collect(run_dir, "http://controller", instances) + + self.assertEqual(status_mock.call_count, 2) + coordinator_payload = yaml.safe_load((run_dir / "logs" / "coordinator" / "status.yaml").read_text(encoding="utf-8")) + node_payload = yaml.safe_load((run_dir / "logs" / "node_0" / "status.yaml").read_text(encoding="utf-8")) + self.assertEqual(coordinator_payload, {"status_code": 200, "status": {"ok": True, "instance_id": "coordinator"}}) + self.assertEqual(node_payload, {"status_code": 503, "status": {"ok": False, "instance_id": "node_0"}}) + + +if __name__ == "__main__": + raise SystemExit(unittest.main()) diff --git a/fluxon_test_stack/tests/test_test_runner_testbed_contract.py b/fluxon_test_stack/tests/test_test_runner_testbed_contract.py index 86f41cb..20b9348 100644 --- a/fluxon_test_stack/tests/test_test_runner_testbed_contract.py +++ b/fluxon_test_stack/tests/test_test_runner_testbed_contract.py @@ -5,6 +5,7 @@ import importlib.util import json import os +import subprocess import sys import tarfile import tempfile @@ -35,13 +36,27 @@ def _load_module(): _RUNNER = _load_module() +_CI_RUNTIME_MOD = sys.modules["test_runner_ci_runtime"] class TestTestRunnerTestbedContract(unittest.TestCase): def test_write_ci_master_owner_configs_emits_owner_large_file_paths(self) -> None: with tempfile.TemporaryDirectory() as td: - run_dir = Path(td) + run_dir = Path(td) / "runner_run" / "results" / "ci_case" / "run_3" + run_dir.mkdir(parents=True, exist_ok=True) resolved_case = { + "runtime": { + "run_dir": str(run_dir), + }, + "profile": { + "test_stack": { + "kind": "FLUXON", + "port_alloc": { + "kv_master_port_base": 50061, + "kv_master_port_stride": 10, + }, + } + }, "deploy": { "instances": [ {"id": "master", "deployer": {"target": "local-node-a"}}, @@ -53,7 +68,7 @@ def test_write_ci_master_owner_configs_emits_owner_large_file_paths(self) -> Non with mock.patch.object(_RUNNER, "_ci_base_runtime_service_target_ip", side_effect=["127.0.0.1", "127.0.0.1"]): with mock.patch.object(_RUNNER, "_ci_base_runtime_service_port", side_effect=[19180, 19190]): - _, owner_path = _RUNNER._write_ci_master_owner_configs( + master_path, owner_path = _RUNNER._write_ci_master_owner_configs( resolved_case, run_dir=run_dir, cluster_name="ci_cluster", @@ -61,42 +76,179 @@ def test_write_ci_master_owner_configs_emits_owner_large_file_paths(self) -> Non owner_dram_bytes=1073741824, ) + master_cfg = yaml.safe_load(master_path.read_text(encoding="utf-8")) owner_cfg = yaml.safe_load(owner_path.read_text(encoding="utf-8")) + expected_master_port = 50061 + 10 * (3 - 1) + _RUNNER._test_stack_runner_port_slot( + runner_root=_RUNNER._test_stack_runner_root(run_dir), + stride=10, + ) + self.assertEqual(master_cfg["port"], expected_master_port) self.assertEqual( owner_cfg["fluxonkv_spec"]["large_file_paths"], [str((run_dir / "services" / "owner_0" / "large").resolve())], ) self.assertNotIn("shared_file_path", owner_cfg["fluxonkv_spec"]) + def test_ci_required_ports_includes_local_master_kv_port(self) -> None: + with tempfile.TemporaryDirectory() as td: + run_dir = Path(td) / "runner_run" / "results" / "ci_case" / "run_2" + run_dir.mkdir(parents=True, exist_ok=True) + resolved_case = { + "runtime": { + "run_dir": str(run_dir), + }, + "profile": { + "test_stack": { + "kind": "FLUXON", + "port_alloc": { + "kv_master_port_base": 50061, + "kv_master_port_stride": 10, + }, + } + }, + "deploy": { + "instances": [ + {"id": "master", "deployer": {"target": "local-node-a"}}, + ], + "target_ip_map": {"local-node-a": "127.0.0.1"}, + }, + "runtime_model": { + "test_bed": {"kind": "ops"}, + "base_runtime": {"service_ids": []}, + "case_runtime": {"instance_ids": ["master"]}, + }, + } + + with mock.patch.object(_RUNNER, "_ci_runtime_cleanup_case", return_value=resolved_case): + with mock.patch.object(_RUNNER, "_ci_local_runtime_targets", return_value={"local-node-a"}): + required_ports = _RUNNER._ci_required_ports(resolved_case) + + expected_master_port = 50061 + 10 * (2 - 1) + _RUNNER._test_stack_runner_port_slot( + runner_root=_RUNNER._test_stack_runner_root(run_dir), + stride=10, + ) + self.assertEqual(required_ports, [("ci master", expected_master_port)]) + def test_ci_runtime_python_executable_requires_python310_on_path(self) -> None: with mock.patch.object(_RUNNER.shutil, "which", return_value=None): - with self.assertRaisesRegex(ValueError, "requires python3.10 on PATH"): + with self.assertRaisesRegex(ValueError, "requires a Python 3.10 interpreter on PATH"): _RUNNER._ci_runtime_python_executable() - def test_create_ci_runtime_venv_uses_python310(self) -> None: + def test_ci_runtime_python_executable_accepts_python3_alias_when_it_is_python310(self) -> None: + with mock.patch.object( + _RUNNER.shutil, + "which", + side_effect=lambda name: { + "python3.10": None, + "python3": "/usr/bin/python3", + "python": "/usr/bin/python", + }.get(name), + ): + with mock.patch.object(_CI_RUNTIME_MOD, "_python_executable_abi", return_value="cpython3.10"): + self.assertEqual(_RUNNER._ci_runtime_python_executable(), "/usr/bin/python3") + + def test_create_ci_runtime_venv_uses_python310_abi_and_seeds_pip(self) -> None: with tempfile.TemporaryDirectory() as td: run_dir = Path(td) venv_dir = (run_dir / "venv").resolve() expected_venv_python = (venv_dir / "bin" / "python3").resolve() + observed_calls: list[list[str]] = [] def _fake_create_venv(argv: list[str], *, cwd: str) -> None: - self.assertEqual( - argv, - ["/usr/bin/python3.10", "-m", "venv", str(venv_dir)], - ) + observed_calls.append(argv) self.assertEqual(cwd, str(run_dir)) - expected_venv_python.parent.mkdir(parents=True, exist_ok=True) - expected_venv_python.write_text("#!/bin/sh\n", encoding="utf-8") + if len(observed_calls) == 1: + self.assertEqual( + argv, + [ + "/usr/bin/python3.10", + "-m", + "venv", + "--without-pip", + str(venv_dir), + ], + ) + expected_venv_python.parent.mkdir(parents=True, exist_ok=True) + expected_venv_python.write_text("#!/bin/sh\n", encoding="utf-8") + return + if len(observed_calls) == 2: + self.assertEqual( + argv, + [ + str(expected_venv_python), + "-m", + "ensurepip", + "--upgrade", + "--default-pip", + ], + ) + return + if len(observed_calls) == 3: + self.assertEqual( + argv, + [ + str(expected_venv_python), + "-m", + "pip", + "--version", + ], + ) + return + self.fail(f"unexpected _run_subprocess call: argv={argv!r}") with mock.patch.object(_RUNNER.shutil, "which", return_value="/usr/bin/python3.10"): - with mock.patch.object(_RUNNER, "_run_subprocess", side_effect=_fake_create_venv) as run_subprocess_mock: - with mock.patch.object(_RUNNER, "_assert_ci_runtime_python_abi") as assert_python_abi: - venv_python = _RUNNER._create_ci_runtime_venv(run_dir=run_dir) + with mock.patch.object(_CI_RUNTIME_MOD, "_python_executable_abi", return_value="cpython3.10"): + with mock.patch.object(_RUNNER, "_run_subprocess", side_effect=_fake_create_venv) as run_subprocess_mock: + with mock.patch.object(_RUNNER, "_assert_ci_runtime_python_abi") as assert_python_abi: + venv_python = _RUNNER._create_ci_runtime_venv(run_dir=run_dir) self.assertEqual(venv_python, expected_venv_python) - run_subprocess_mock.assert_called_once() + self.assertEqual( + observed_calls, + [ + ["/usr/bin/python3.10", "-m", "venv", "--without-pip", str(venv_dir)], + [str(expected_venv_python), "-m", "ensurepip", "--upgrade", "--default-pip"], + [str(expected_venv_python), "-m", "pip", "--version"], + ], + ) + self.assertEqual(run_subprocess_mock.call_count, 3) assert_python_abi.assert_called_once_with(venv_python=expected_venv_python) + def test_runner_native_bin_kvtest_scene_stays_on_direct_wrapper_command(self) -> None: + suite = _RUNNER._parse_suite_config( + yaml.safe_load( + (REPO_ROOT / "fluxon_test_stack" / "ci_test_list.yaml").read_text(encoding="utf-8") + ) + ) + cases = _RUNNER._expand_cases(suite) + case = next(item for item in cases if item.scene_id == "ci_top_attention_bin_kvtest" and item.profile_id == "fluxon_tcp") + + planned = _RUNNER._build_ci_execution_plan(case, suite) + + self.assertEqual(len(planned), 1) + self.assertEqual(planned[0].ci_commands[0]["id"], "top_attention_bin_kvtest") + self.assertIn( + "fluxon_test_stack/top_attention_test_index/_bin_kvtest.py", + planned[0].ci_commands[0]["command"], + ) + + def test_run_subprocess_reports_cwd_and_argv_on_failure(self) -> None: + completed = subprocess.CompletedProcess( + args=["/usr/bin/python3", "-c", "raise SystemExit(2)"], + returncode=2, + stdout="", + stderr="boom\n", + ) + with mock.patch.object(_RUNNER.subprocess, "run", return_value=completed): + with self.assertRaisesRegex( + RuntimeError, + r"command failed: rc=2 cwd=/tmp argv=/usr/bin/python3 -c 'raise SystemExit\(2\)'", + ): + _RUNNER._run_subprocess( + ["/usr/bin/python3", "-c", "raise SystemExit(2)"], + cwd="/tmp", + ) + def test_assert_ci_runtime_python_abi_accepts_python310_venv(self) -> None: with mock.patch.object(_RUNNER.subprocess, "check_output", return_value="cpython3.10\n") as check_output_mock: _RUNNER._assert_ci_runtime_python_abi(venv_python=Path("/tmp/venv/bin/python3")) @@ -159,7 +311,415 @@ def test_finalize_ci_case_runtime_deletes_each_apply_id_once(self) -> None: [call.kwargs["apply_id"] for call in delete_apply.call_args_list], ["apply-runner", "apply-cluster"], ) - cleanup_runtime.assert_called_once_with(resolved_case, timeout_s=120) + cleanup_runtime.assert_called_once_with( + resolved_case, + timeout_s=120, + instance_ids=["ci_runner", "master", "owner_0"], + ) + + def test_finalize_ci_case_runtime_cleanup_skips_untracked_ci_runner(self) -> None: + with tempfile.TemporaryDirectory() as td: + run_dir = Path(td) + tracking = _RUNNER._CaseRuntimeTracking( + ci_attempted_instance_ids=["master", "owner_0", "ci_runner"], + ci_apply_ids={ + "master": "apply-cluster", + "owner_0": "apply-cluster", + }, + ) + resolved_case = { + "case": { + "run_mode": _RUNNER.RUN_MODE_FULL_ONCE, + "case_id": "ci_top_attention_mq_core__n1_kvowner_dram_20gib__fluxon_tcp_thread", + } + } + + with mock.patch.object(_RUNNER, "_delete_apply_id") as delete_apply: + with mock.patch.object(_RUNNER, "_ci_cleanup_runtime") as cleanup_runtime: + _RUNNER._finalize_ci_case_runtime( + resolved_case, + run_dir=run_dir, + runtime_tracking=tracking, + outcome=_RUNNER.RUN_OUTCOME_SUCCESS, + ) + + self.assertEqual([call.kwargs["apply_id"] for call in delete_apply.call_args_list], ["apply-cluster"]) + cleanup_runtime.assert_called_once_with( + resolved_case, + timeout_s=120, + instance_ids=["master", "owner_0"], + ) + + def test_execute_ci_case_releases_ci_runner_after_terminal_success(self) -> None: + resolved_case = { + "case": { + "case_id": "ci_top_attention_mq_core__n1_kvowner_dram_20gib__fluxon_tcp_thread", + "case_key": "case-key", + }, + "scene": { + "ci": { + "commands": [ + { + "id": "top_attention_mq_core", + "command": "__RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_mq_core.py", + "timeout_seconds": 60, + } + ] + } + }, + } + prepared_case = _RUNNER._PreparedCase( + plan=_RUNNER._CasePlan( + case_family=_RUNNER.CASE_FAMILY_CI, + prepare_phases=(), + execute_phases=( + _RUNNER._RuntimePhase( + phase_id="ci_runner", + layer=_RUNNER.RUNTIME_LAYER_CASE, + instance_ids=("ci_runner",), + write_ctx="CI execute", + ), + ), + ), + ci_runner_exit_code_baseline=None, + ) + tracking = _RUNNER._CaseRuntimeTracking(ci_apply_ids={"ci_runner": "apply-runner"}) + + with ( + mock.patch.object( + _RUNNER, + "_deploy_runtime_phase", + return_value={"history_id": "apply-runner"}, + ), + mock.patch.object(_RUNNER, "_record_ci_apply_id") as record_apply, + mock.patch.object(_RUNNER, "_wait_ci_instance_ready") as wait_ready, + mock.patch.object(_RUNNER, "_wait_ci_runner_exit_code", return_value=0) as wait_exit_code, + mock.patch.object(_RUNNER, "_delete_apply_id") as delete_apply, + ): + executed = _RUNNER._execute_ci_case( + planned_case=mock.Mock(ci_commands=[]), + resolved_case=resolved_case, + run_dir=Path("/tmp/ci_run_dir"), + run_index=3, + started_at=100, + prepared_case=prepared_case, + runtime_tracking=tracking, + ) + + self.assertEqual(executed.outcome, _RUNNER.RUN_OUTCOME_SUCCESS) + self.assertEqual(executed.summary["ci"], {"rc": 0}) + record_apply.assert_called_once() + wait_ready.assert_called_once_with(resolved_case, instance_id="ci_runner") + wait_exit_code.assert_called_once() + delete_apply.assert_called_once_with( + resolved_case, + apply_id="apply-runner", + ctx="CI ci_runner terminal success apply", + ) + self.assertNotIn("ci_runner", tracking.ci_apply_ids) + + def test_execute_ci_case_preserves_success_when_ci_runner_cleanup_fails(self) -> None: + resolved_case = { + "case": { + "case_id": "ci_top_attention_mq_core__n1_kvowner_dram_20gib__fluxon_tcp_thread", + "case_key": "case-key", + }, + "scene": { + "ci": { + "commands": [ + { + "id": "top_attention_mq_core", + "command": "__RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_mq_core.py", + "timeout_seconds": 60, + } + ] + } + }, + } + prepared_case = _RUNNER._PreparedCase( + plan=_RUNNER._CasePlan( + case_family=_RUNNER.CASE_FAMILY_CI, + prepare_phases=(), + execute_phases=( + _RUNNER._RuntimePhase( + phase_id="ci_runner", + layer=_RUNNER.RUNTIME_LAYER_CASE, + instance_ids=("ci_runner",), + write_ctx="CI execute", + ), + ), + ), + ci_runner_exit_code_baseline=None, + ) + tracking = _RUNNER._CaseRuntimeTracking(ci_apply_ids={"ci_runner": "apply-runner"}) + + with ( + mock.patch.object( + _RUNNER, + "_deploy_runtime_phase", + return_value={"history_id": "apply-runner"}, + ), + mock.patch.object(_RUNNER, "_record_ci_apply_id"), + mock.patch.object(_RUNNER, "_wait_ci_instance_ready"), + mock.patch.object(_RUNNER, "_wait_ci_runner_exit_code", return_value=0), + mock.patch.object(_RUNNER, "_delete_apply_id", side_effect=RuntimeError("controller stop failed")), + mock.patch("builtins.print") as print_mock, + ): + executed = _RUNNER._execute_ci_case( + planned_case=mock.Mock(ci_commands=[]), + resolved_case=resolved_case, + run_dir=Path("/tmp/ci_run_dir"), + run_index=3, + started_at=100, + prepared_case=prepared_case, + runtime_tracking=tracking, + ) + + self.assertEqual(executed.outcome, _RUNNER.RUN_OUTCOME_SUCCESS) + self.assertEqual(executed.summary["ci"], {"rc": 0}) + self.assertNotIn("ci_runner", tracking.ci_apply_ids) + self.assertTrue(any("terminal success cleanup failed" in str(call) for call in print_mock.call_args_list)) + + def test_wait_ci_runner_exit_code_prefers_exit_code_file_after_terminal_status(self) -> None: + with tempfile.TemporaryDirectory() as td: + run_dir = Path(td) + exit_code_path = run_dir / "logs" / "ci_runner" / "exit_code.txt" + exit_code_path.parent.mkdir(parents=True, exist_ok=True) + resolved_case = { + "deploy": { + "controller_url": "http://127.0.0.1:19080/r/ops/fluxon_testbed", + "target_ip_map": {"logic-a": "10.0.0.2"}, + "instances": [ + { + "id": "ci_runner", + "k8s_ref": "deployment/ci-runner", + "deployer": {"target": "logic-a"}, + } + ], + } + } + baseline_state = None + observe_states = [None, None, _RUNNER._ObservedFileState(size=2, mtime_ns=1)] + status_calls = [ + {"ok": True, "running": False, "exit_code": 143}, + ] + file_reads = iter([None, "0\n"]) + + def _fake_observe_file_state(path: Path): + self.assertEqual(path, exit_code_path.resolve()) + if observe_states: + return observe_states.pop(0) + return _RUNNER._ObservedFileState(size=2, mtime_ns=1) + + def _fake_instance_read_text_if_present(*_args, **_kwargs): + return next(file_reads, "0\n") + + def _fake_instance_status(*_args, **_kwargs): + return status_calls.pop(0) + + with ( + mock.patch.object(_RUNNER, "_print_ci_wait_progress", return_value=(0, 999999999.0, "")), + mock.patch.object(_RUNNER, "_observe_file_state", side_effect=_fake_observe_file_state), + mock.patch.object(_RUNNER, "_instance_read_text_if_present", side_effect=_fake_instance_read_text_if_present), + mock.patch.object(_RUNNER, "_instance_status", side_effect=_fake_instance_status), + mock.patch.object(_RUNNER.time, "sleep"), + ): + exit_code_path.write_text("0\n", encoding="utf-8") + rc = _RUNNER._wait_ci_runner_exit_code( + resolved_case=resolved_case, + run_dir=run_dir, + timeout_s=60, + baseline_state=baseline_state, + ) + + self.assertEqual(rc, 0) + + def test_wait_ci_runner_exit_code_prefers_stdout_marker_after_terminal_status(self) -> None: + with tempfile.TemporaryDirectory() as td: + run_dir = Path(td) + exit_code_path = (run_dir / "logs" / "ci_runner" / "exit_code.txt").resolve() + stdout_path = (run_dir / "logs" / "ci_runner" / "stdout.log").resolve() + stdout_path.parent.mkdir(parents=True, exist_ok=True) + resolved_case = { + "deploy": { + "controller_url": "http://127.0.0.1:19080/r/ops/fluxon_testbed", + "target_ip_map": {"logic-a": "10.0.0.2"}, + "instances": [ + { + "id": "ci_runner", + "k8s_ref": "deployment/ci-runner", + "deployer": {"target": "logic-a"}, + } + ], + } + } + status_calls = [ + {"ok": True, "running": False, "exit_code": 143}, + ] + stdout_reads = iter( + [ + "[ci_runner] running tests\n", + "[ci_runner] SUCCESS rc=0\n[ci_runner] wrote exit_code=0; holding until controller stop\n", + ] + ) + + def _fake_observe_file_state(path: Path): + self.assertEqual(path, exit_code_path) + return None + + def _fake_instance_read_text_if_present(*_args, **kwargs): + path = kwargs["path"] + if path == exit_code_path: + return None + self.assertEqual(path, stdout_path) + return next(stdout_reads) + + def _fake_instance_status(*_args, **_kwargs): + return status_calls.pop(0) + + with ( + mock.patch.object(_RUNNER, "_print_ci_wait_progress", return_value=(0, 999999999.0, "")), + mock.patch.object(_RUNNER, "_observe_file_state", side_effect=_fake_observe_file_state), + mock.patch.object(_RUNNER, "_instance_read_text_if_present", side_effect=_fake_instance_read_text_if_present), + mock.patch.object(_RUNNER, "_instance_status", side_effect=_fake_instance_status), + mock.patch.object(_RUNNER.time, "sleep"), + ): + rc = _RUNNER._wait_ci_runner_exit_code( + resolved_case=resolved_case, + run_dir=run_dir, + timeout_s=60, + baseline_state=None, + ) + + self.assertEqual(rc, 0) + + def test_wait_ci_runner_exit_code_returns_progress_stdout_marker_without_second_read(self) -> None: + with tempfile.TemporaryDirectory() as td: + run_dir = Path(td) + exit_code_path = (run_dir / "logs" / "ci_runner" / "exit_code.txt").resolve() + resolved_case = { + "deploy": { + "controller_url": "http://127.0.0.1:19080/r/ops/fluxon_testbed", + "target_ip_map": {"logic-a": "10.0.0.2"}, + "instances": [ + { + "id": "ci_runner", + "k8s_ref": "deployment/ci-runner", + "deployer": {"target": "logic-a"}, + } + ], + } + } + stdout_chunk = ( + "[ci_runner] SUCCESS rc=0\n" + "[ci_runner] wrote exit_code=0; holding until controller stop\n" + ) + + with ( + mock.patch.object( + _RUNNER, + "_print_ci_wait_progress", + return_value=(len(stdout_chunk), 999999999.0, stdout_chunk), + ), + mock.patch.object(_RUNNER, "_observe_file_state", side_effect=AssertionError("exit_code should not be reread")), + mock.patch.object(_RUNNER, "_instance_read_text_if_present", side_effect=AssertionError("stdout should not be reread")), + mock.patch.object(_RUNNER, "_instance_status", side_effect=AssertionError("status should not be queried")), + ): + rc = _RUNNER._wait_ci_runner_exit_code( + resolved_case=resolved_case, + run_dir=run_dir, + timeout_s=60, + baseline_state=None, + ) + + self.assertEqual(exit_code_path.name, "exit_code.txt") + self.assertEqual(rc, 0) + + def test_wait_ci_runner_exit_code_uses_stdout_marker_when_exit_code_file_is_empty(self) -> None: + with tempfile.TemporaryDirectory() as td: + run_dir = Path(td) + exit_code_path = run_dir / "logs" / "ci_runner" / "exit_code.txt" + stdout_path = run_dir / "logs" / "ci_runner" / "stdout.log" + exit_code_path.parent.mkdir(parents=True, exist_ok=True) + exit_code_path.write_text("", encoding="utf-8") + stdout_path.write_text( + "[ci_runner] SUCCESS rc=0\n[ci_runner] wrote exit_code=0; holding until controller stop\n", + encoding="utf-8", + ) + resolved_case = { + "deploy": { + "controller_url": "http://127.0.0.1:19080/r/ops/fluxon_testbed", + "target_ip_map": {"logic-a": "10.0.0.2"}, + "instances": [ + { + "id": "ci_runner", + "k8s_ref": "deployment/ci-runner", + "deployer": {"target": "logic-a"}, + } + ], + } + } + + rc = _RUNNER._read_ci_runner_exit_code_if_present( + resolved_case=resolved_case, + run_dir=run_dir, + baseline_state=None, + local_ctx="ci_runner.exit_code", + remote_ctx="ci_runner.remote_exit_code", + ) + + self.assertEqual(rc, 0) + + def test_wait_ci_runner_exit_code_resume_prefers_exit_code_file_after_terminal_status(self) -> None: + with tempfile.TemporaryDirectory() as td: + run_dir = Path(td) + exit_code_path = run_dir / "logs" / "ci_runner" / "exit_code.txt" + exit_code_path.parent.mkdir(parents=True, exist_ok=True) + resolved_case = { + "deploy": { + "controller_url": "http://127.0.0.1:19080/r/ops/fluxon_testbed", + "target_ip_map": {"logic-a": "10.0.0.2"}, + "instances": [ + { + "id": "ci_runner", + "k8s_ref": "deployment/ci-runner", + "deployer": {"target": "logic-a"}, + } + ], + } + } + observe_states = [None, None, _RUNNER._ObservedFileState(size=2, mtime_ns=1)] + status_calls = [ + {"ok": True, "running": False, "exit_code": 143}, + ] + file_reads = iter([None, "0\n"]) + + def _fake_observe_file_state(path: Path): + self.assertEqual(path, exit_code_path.resolve()) + if observe_states: + return observe_states.pop(0) + return _RUNNER._ObservedFileState(size=2, mtime_ns=1) + + def _fake_instance_read_text_if_present(*_args, **_kwargs): + return next(file_reads, "0\n") + + def _fake_instance_status(*_args, **_kwargs): + return status_calls.pop(0) + + with ( + mock.patch.object(_RUNNER, "_observe_file_state", side_effect=_fake_observe_file_state), + mock.patch.object(_RUNNER, "_instance_read_text_if_present", side_effect=_fake_instance_read_text_if_present), + mock.patch.object(_RUNNER, "_instance_status", side_effect=_fake_instance_status), + mock.patch.object(_RUNNER.time, "sleep"), + ): + exit_code_path.write_text("0\n", encoding="utf-8") + rc = _RUNNER._wait_ci_runner_exit_code_resume( + resolved_case=resolved_case, + run_dir=run_dir, + timeout_s=60, + ) + + self.assertEqual(rc, 0) def test_finalize_ci_case_runtime_preserves_structured_instance_ids(self) -> None: with tempfile.TemporaryDirectory() as td: @@ -198,6 +758,110 @@ def test_finalize_ci_case_runtime_preserves_structured_instance_ids(self) -> Non }, ) + def test_finalize_test_stack_case_runtime_collects_status_and_records_collect_error(self) -> None: + with tempfile.TemporaryDirectory() as td: + run_dir = Path(td) + summary_path = run_dir / "summary.yaml" + _RUNNER._write_yaml_file( + summary_path, + { + "schema_version": _RUNNER.SCHEMA_VERSION, + "case_id": "bench_case", + "case_key": "bench_case_key", + "run_index": 1, + "outcome": _RUNNER.RUN_OUTCOME_FAILED, + "counted": False, + "timing": { + "started_at_unix_s": 100, + "finished_at_unix_s": 200, + }, + "test_stack": { + "coordinator_addr": "127.0.0.1:19999", + "completion_signal": "benchmark_result_json", + "result_path": str((run_dir / "benchmark_result.json").resolve()), + "result": None, + "error": "RuntimeError: benchmark failed", + "collect_error": None, + }, + }, + ) + resolved_case = { + "case": { + "run_mode": _RUNNER.RUN_MODE_DEBUG_ONE_BY_ONE, + "case_id": "bench_case", + "case_key": "bench_case_key", + }, + "deploy": { + "instances": [ + {"id": "coordinator", "deployer": {"target": "local-node-a"}}, + {"id": "node_0", "deployer": {"target": "local-node-b"}}, + ] + }, + } + tracking = _RUNNER._CaseRuntimeTracking( + ts_coord_deploy_attempted=True, + ts_coord_apply_id="apply-coord", + ts_nodes_deploy_attempted=True, + ts_nodes_apply_id="apply-node", + ) + + def _fake_run_adapter_action(resolved_case, *, run_dir: Path, action: str): + self.assertEqual(action, "collect") + instances = _RUNNER._require_list(resolved_case["deploy"]["instances"], "resolved_case.deploy.instances") + for instance in instances: + inst_id = _RUNNER._require_str(instance.get("id"), "deploy.instances[].id") + inst_dir = (run_dir / "logs" / inst_id).resolve() + inst_dir.mkdir(parents=True, exist_ok=True) + _RUNNER._write_yaml_file( + inst_dir / "status.yaml", + {"status_code": 500, "status": {"ok": False, "instance_id": inst_id}}, + ) + raise RuntimeError("collect boom") + + with mock.patch.object(_RUNNER, "_run_adapter_action", side_effect=_fake_run_adapter_action): + with mock.patch.object(_RUNNER, "_delete_apply_id") as delete_apply: + _RUNNER._finalize_test_stack_case_runtime( + resolved_case, + run_dir=run_dir, + runtime_tracking=tracking, + outcome=_RUNNER.RUN_OUTCOME_FAILED, + ) + + delete_apply.assert_not_called() + self.assertTrue((run_dir / "logs" / "coordinator" / "status.yaml").exists()) + self.assertTrue((run_dir / "logs" / "node_0" / "status.yaml").exists()) + updated_summary = yaml.safe_load(summary_path.read_text(encoding="utf-8")) + self.assertEqual( + updated_summary["test_stack"]["collect_error"], + "RuntimeError: collect boom", + ) + + def test_finalize_error_preserves_success_for_ci_and_bench(self) -> None: + self.assertTrue( + _RUNNER._preserve_success_after_finalize_error( + case_family=_RUNNER.CASE_FAMILY_CI, + outcome=_RUNNER.RUN_OUTCOME_SUCCESS, + ) + ) + self.assertTrue( + _RUNNER._preserve_success_after_finalize_error( + case_family=_RUNNER.CASE_FAMILY_BENCH, + outcome=_RUNNER.RUN_OUTCOME_SUCCESS, + ) + ) + self.assertFalse( + _RUNNER._preserve_success_after_finalize_error( + case_family=_RUNNER.CASE_FAMILY_CI, + outcome=_RUNNER.RUN_OUTCOME_FAILED, + ) + ) + self.assertFalse( + _RUNNER._preserve_success_after_finalize_error( + case_family=_RUNNER.CASE_FAMILY_INFER, + outcome=_RUNNER.RUN_OUTCOME_SUCCESS, + ) + ) + def test_write_ci_scene_config_yaml_emits_structured_scene_config(self) -> None: with tempfile.TemporaryDirectory() as td: run_dir = Path(td) @@ -313,6 +977,77 @@ def test_top_attention_ci_execution_plan_is_runner_native(self) -> None: self.assertEqual(planned[0].ci_commands[0]["id"], "top_attention_bin_kvtest") self.assertIn("--case-config __RUN_DIR__/configs/ci_scene_config.yaml", planned[0].ci_commands[0]["command"]) + def test_top_attention_cargo_fs_core_ci_execution_plan_is_runner_native(self) -> None: + suite_cfg = yaml.safe_load((_RUNNER.RUNNER_REPO_ROOT / "fluxon_test_stack" / "ci_test_list.yaml").read_text(encoding="utf-8")) + suite = _RUNNER._parse_suite_config(suite_cfg) + cases = _RUNNER._expand_cases(suite) + case = next(item for item in cases if item.scene_id == "ci_top_attention_cargo_fs_core" and item.profile_id == "fluxon_tcp") + planned = _RUNNER._build_ci_execution_plan(case, suite) + self.assertEqual(len(planned), 1) + self.assertEqual(planned[0].ci_commands[0]["id"], "top_attention_cargo_fs_core") + self.assertIn( + "__RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_cargo_fs_core.py", + planned[0].ci_commands[0]["command"], + ) + self.assertNotIn("--case-config", planned[0].ci_commands[0]["command"]) + + def test_top_attention_cargo_util_ci_execution_plan_is_runner_native(self) -> None: + suite_cfg = yaml.safe_load((_RUNNER.RUNNER_REPO_ROOT / "fluxon_test_stack" / "ci_test_list.yaml").read_text(encoding="utf-8")) + suite = _RUNNER._parse_suite_config(suite_cfg) + cases = _RUNNER._expand_cases(suite) + case = next(item for item in cases if item.scene_id == "ci_top_attention_cargo_util" and item.profile_id == "fluxon_tcp") + planned = _RUNNER._build_ci_execution_plan(case, suite) + self.assertEqual(len(planned), 1) + self.assertEqual(planned[0].ci_commands[0]["id"], "top_attention_cargo_util") + self.assertIn( + "__RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_cargo_util.py", + planned[0].ci_commands[0]["command"], + ) + self.assertIn("--case-config __RUN_DIR__/configs/ci_scene_config.yaml", planned[0].ci_commands[0]["command"]) + + def test_top_attention_cargo_kv_unit_ci_execution_plan_is_runner_native(self) -> None: + suite_cfg = yaml.safe_load((_RUNNER.RUNNER_REPO_ROOT / "fluxon_test_stack" / "ci_test_list.yaml").read_text(encoding="utf-8")) + suite = _RUNNER._parse_suite_config(suite_cfg) + cases = _RUNNER._expand_cases(suite) + case = next(item for item in cases if item.scene_id == "ci_top_attention_cargo_kv_unit" and item.profile_id == "fluxon_tcp") + planned = _RUNNER._build_ci_execution_plan(case, suite) + self.assertEqual(len(planned), 1) + self.assertEqual(planned[0].ci_commands[0]["id"], "top_attention_cargo_kv_unit") + self.assertIn( + "__RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_cargo_kv_unit.py", + planned[0].ci_commands[0]["command"], + ) + self.assertIn("--case-config __RUN_DIR__/configs/ci_scene_config.yaml", planned[0].ci_commands[0]["command"]) + + def test_additional_top_attention_cargo_ci_execution_plans_are_runner_native(self) -> None: + suite_cfg = yaml.safe_load((_RUNNER.RUNNER_REPO_ROOT / "fluxon_test_stack" / "ci_test_list.yaml").read_text(encoding="utf-8")) + suite = _RUNNER._parse_suite_config(suite_cfg) + cases = _RUNNER._expand_cases(suite) + expected = { + "ci_top_attention_cargo_cli": ("top_attention_cargo_cli", "_cargo_cli.py"), + "ci_top_attention_cargo_commu": ("top_attention_cargo_commu", "_cargo_commu.py"), + "ci_top_attention_cargo_commu_contract": ("top_attention_cargo_commu_contract", "_cargo_commu_contract.py"), + "ci_top_attention_cargo_framework": ("top_attention_cargo_framework", "_cargo_framework.py"), + "ci_top_attention_cargo_fs": ("top_attention_cargo_fs", "_cargo_fs.py"), + "ci_top_attention_cargo_fs_s3_gateway": ("top_attention_cargo_fs_s3_gateway", "_cargo_fs_s3_gateway.py"), + "ci_top_attention_cargo_limit_thirdparty": ("top_attention_cargo_limit_thirdparty", "_cargo_limit_thirdparty.py"), + "ci_top_attention_cargo_mq": ("top_attention_cargo_mq", "_cargo_mq.py"), + "ci_top_attention_cargo_observability": ("top_attention_cargo_observability", "_cargo_observability.py"), + "ci_top_attention_cargo_ops": ("top_attention_cargo_ops", "_cargo_ops.py"), + "ci_top_attention_cargo_pyo3": ("top_attention_cargo_pyo3", "_cargo_pyo3.py"), + } + for scene_id, (command_id, script_name) in expected.items(): + with self.subTest(scene_id=scene_id): + case = next(item for item in cases if item.scene_id == scene_id and item.profile_id == "fluxon_tcp") + planned = _RUNNER._build_ci_execution_plan(case, suite) + self.assertEqual(len(planned), 1) + self.assertEqual(planned[0].ci_commands[0]["id"], command_id) + self.assertIn( + f"__RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/{script_name}", + planned[0].ci_commands[0]["command"], + ) + self.assertNotIn("--case-config", planned[0].ci_commands[0]["command"]) + def test_top_attention_log_mgmt_ci_execution_plan_is_runner_native(self) -> None: suite_cfg = yaml.safe_load((_RUNNER.RUNNER_REPO_ROOT / "fluxon_test_stack" / "ci_test_list.yaml").read_text(encoding="utf-8")) artifact_sets = suite_cfg.get("artifact_sets") @@ -333,7 +1068,6 @@ def test_top_attention_log_mgmt_ci_execution_plan_is_runner_native(self) -> None self.assertEqual(planned[0].ci_commands[0]["id"], "top_attention_log_mgmt") self.assertIn( "__RUN_DIR__/src/fluxon_test_stack/top_attention_test_index/_log_mgmt.py", - planned[0].ci_commands[0]["command"], ) self.assertIn("--case-config __RUN_DIR__/configs/ci_scene_config.yaml", planned[0].ci_commands[0]["command"]) @@ -352,6 +1086,36 @@ def test_top_attention_mq_core_ci_execution_plan_is_runner_native(self) -> None: ) self.assertIn("--case-config __RUN_DIR__/configs/ci_scene_config.yaml", planned[0].ci_commands[0]["command"]) + def test_top_attention_mq_core_ci_plan_has_no_collect_phase(self) -> None: + resolved_case = { + "case": { + "family": "ci", + "case_id": "ci_top_attention_mq_core__n1_kvowner_dram_20gib__fluxon_tcp_thread", + }, + "scene": { + "ci": { + "runtime_contract": "cluster_kv_owner", + "subject": "mq", + }, + }, + "deploy": { + "instances": [ + {"id": "master"}, + {"id": "owner_0"}, + {"id": "ci_runner"}, + ], + }, + "runtime_model": { + "test_bed": {"kind": "ops"}, + "base_runtime": {}, + "case_runtime": {"instance_ids": ["master", "owner_0", "ci_runner"]}, + }, + } + case_plan = _RUNNER._compile_case_plan(resolved_case) + self.assertEqual(tuple(phase.phase_id for phase in case_plan.prepare_phases), ("cluster_runtime",)) + self.assertEqual(tuple(phase.phase_id for phase in case_plan.execute_phases), ("ci_runner",)) + self.assertEqual(case_plan.execute_phases[0].instance_ids, ("ci_runner",)) + def test_doc_page_ci_execution_plan_uses_online_docker_image(self) -> None: suite_cfg = yaml.safe_load((_RUNNER.RUNNER_REPO_ROOT / "fluxon_test_stack" / "ci_test_list.yaml").read_text(encoding="utf-8")) suite = _RUNNER._parse_suite_config(suite_cfg) diff --git a/fluxon_test_stack/tests/test_top_attention_bin_kvtest_contract.py b/fluxon_test_stack/tests/test_top_attention_bin_kvtest_contract.py index 4d9b39c..e36a0d0 100644 --- a/fluxon_test_stack/tests/test_top_attention_bin_kvtest_contract.py +++ b/fluxon_test_stack/tests/test_top_attention_bin_kvtest_contract.py @@ -97,6 +97,7 @@ def test_main_writes_build_config_ext_and_calls_cargo(self) -> None: run_cargo.call_args.kwargs["env"]["FLUXON_KV_TEST_ROUNDS"], "p2p_only", ) + self.assertNotIn("FLUXON_BUILD_CONFIG_EXT_PATH", run_cargo.call_args.kwargs["env"]) if __name__ == "__main__": diff --git a/fluxon_test_stack/tests/test_top_attention_cargo_fs_core_contract.py b/fluxon_test_stack/tests/test_top_attention_cargo_fs_core_contract.py new file mode 100644 index 0000000..f1cddbe --- /dev/null +++ b/fluxon_test_stack/tests/test_top_attention_cargo_fs_core_contract.py @@ -0,0 +1,59 @@ +#!/usr/bin/env python3 + +from __future__ import annotations + +import importlib.util +import sys +import unittest +from pathlib import Path +from unittest import mock + + +REPO_ROOT = Path(__file__).resolve().parents[2] +MODULE_PATH = REPO_ROOT / "fluxon_test_stack" / "top_attention_test_index" / "_cargo_fs_core.py" + + +def _load_module(): + module_dir = MODULE_PATH.parent + sys.path.insert(0, str(module_dir)) + try: + spec = importlib.util.spec_from_file_location("fluxon_test_stack_top_attention_cargo_fs_core_contract", MODULE_PATH) + assert spec is not None and spec.loader is not None + mod = importlib.util.module_from_spec(spec) + sys.modules[spec.name] = mod + spec.loader.exec_module(mod) + return mod + finally: + if sys.path and sys.path[0] == str(module_dir): + sys.path.pop(0) + + +_ENTRY = _load_module() + + +class TestTopAttentionCargoFsCoreContract(unittest.TestCase): + def test_main_calls_cargo_test_for_fs_core_crate(self) -> None: + with mock.patch.object(_ENTRY, "run_cargo", return_value=0) as run_cargo: + with mock.patch.object(sys, "argv", [str(MODULE_PATH)]): + rc = _ENTRY.main() + + self.assertEqual(rc, 0) + self.assertEqual( + run_cargo.call_args.args[0], + [ + "test", + "--manifest-path", + str(REPO_ROOT / "fluxon_rs" / "fluxon_fs_core" / "Cargo.toml"), + ], + ) + + def test_main_rejects_pytest_style_passthrough_flags(self) -> None: + with mock.patch.object(sys, "argv", [str(MODULE_PATH), "-k", "lease"]): + with self.assertRaises(SystemExit) as cm: + _ENTRY.main() + + self.assertEqual(cm.exception.code, 2) + + +if __name__ == "__main__": + raise SystemExit(unittest.main()) diff --git a/fluxon_test_stack/tests/test_top_attention_cargo_kv_unit_contract.py b/fluxon_test_stack/tests/test_top_attention_cargo_kv_unit_contract.py new file mode 100644 index 0000000..4b82e17 --- /dev/null +++ b/fluxon_test_stack/tests/test_top_attention_cargo_kv_unit_contract.py @@ -0,0 +1,140 @@ +#!/usr/bin/env python3 + +from __future__ import annotations + +import importlib.util +import os +import sys +import tempfile +import unittest +from pathlib import Path +from unittest import mock + +import yaml + + +REPO_ROOT = Path(__file__).resolve().parents[2] +MODULE_PATH = REPO_ROOT / "fluxon_test_stack" / "top_attention_test_index" / "_cargo_kv_unit.py" + + +def _load_module(): + module_dir = MODULE_PATH.parent + sys.path.insert(0, str(module_dir)) + try: + spec = importlib.util.spec_from_file_location("fluxon_test_stack_top_attention_cargo_kv_unit_contract", MODULE_PATH) + assert spec is not None and spec.loader is not None + mod = importlib.util.module_from_spec(spec) + sys.modules[spec.name] = mod + spec.loader.exec_module(mod) + return mod + finally: + if sys.path and sys.path[0] == str(module_dir): + sys.path.pop(0) + + +_ENTRY = _load_module() + + +class TestTopAttentionCargoKvUnitContract(unittest.TestCase): + def test_main_accepts_case_config_and_uses_scene_config_feature(self) -> None: + with tempfile.TemporaryDirectory() as td: + run_dir = Path(td) + cfg_dir = run_dir / "configs" + cfg_dir.mkdir(parents=True) + src_dir = run_dir / "src" + src_dir.mkdir(parents=True) + case_cfg = cfg_dir / "ci_scene_config.yaml" + case_cfg.write_text( + yaml.safe_dump( + { + "case": { + "scene_id": "ci_top_attention_cargo_kv_unit", + "scale_id": "n1_kvowner_dram_20gib", + "profile_id": "fluxon_tcp", + "case_id": "ci_top_attention_cargo_kv_unit__n1_kvowner_dram_20gib__fluxon_tcp", + }, + "scene_config": { + "kv_transport_feature": "tcp_thread_transport", + }, + "scene_runtime": { + "etcd": {"ip": "127.0.0.1", "port": 19180}, + "greptime": {"ip": "127.0.0.1", "port": 19190}, + }, + }, + sort_keys=False, + ), + encoding="utf-8", + ) + + with mock.patch.dict(os.environ, {"FLUXON_KV_TEST_TRANSPORT_FEATURE": "fastws_transport"}, clear=False): + with mock.patch.object(_ENTRY, "run_cargo", return_value=0) as run_cargo: + with mock.patch.object( + sys, + "argv", + [str(MODULE_PATH), "--case-config", str(case_cfg)], + ): + rc = _ENTRY.main() + + self.assertEqual(rc, 0) + build_cfg = yaml.safe_load((src_dir / "build_config_ext.yml").read_text(encoding="utf-8")) + self.assertEqual( + build_cfg, + { + "etcd": "127.0.0.1:19180", + "prom": "http://127.0.0.1:19190/v1/prometheus", + "prom_remote_write_url": "http://127.0.0.1:19190/v1/prometheus/write", + }, + ) + self.assertEqual( + run_cargo.call_args.args[0], + [ + "test", + "--manifest-path", + str(REPO_ROOT / "fluxon_rs" / "fluxon_kv" / "Cargo.toml"), + "--no-default-features", + "--features", + "p2p_transfer,tcp_thread_transport", + ], + ) + self.assertNotIn("env", run_cargo.call_args.kwargs) + + def test_main_rejects_feature_override_flag(self) -> None: + with tempfile.TemporaryDirectory() as td: + run_dir = Path(td) + cfg_dir = run_dir / "configs" + cfg_dir.mkdir(parents=True) + case_cfg = cfg_dir / "ci_scene_config.yaml" + case_cfg.write_text( + yaml.safe_dump( + { + "case": {"scene_id": "ci_top_attention_cargo_kv_unit"}, + "scene_config": {"kv_transport_feature": "tcp_thread_transport"}, + "scene_runtime": { + "etcd": {"ip": "127.0.0.1", "port": 19180}, + "greptime": {"ip": "127.0.0.1", "port": 19190}, + }, + }, + sort_keys=False, + ), + encoding="utf-8", + ) + with mock.patch.object( + sys, + "argv", + [str(MODULE_PATH), "--case-config", str(case_cfg), "--feature", "fastws_transport"], + ): + with self.assertRaises(SystemExit) as cm: + _ENTRY.main() + + self.assertEqual(cm.exception.code, 2) + + def test_main_rejects_pytest_style_passthrough_flags(self) -> None: + with mock.patch.object(sys, "argv", [str(MODULE_PATH), "-k", "lease"]): + with self.assertRaises(SystemExit) as cm: + _ENTRY.main() + + self.assertEqual(cm.exception.code, 2) + + +if __name__ == "__main__": + raise SystemExit(unittest.main()) diff --git a/fluxon_test_stack/tests/test_top_attention_cargo_tikv_contract.py b/fluxon_test_stack/tests/test_top_attention_cargo_tikv_contract.py new file mode 100644 index 0000000..63b9686 --- /dev/null +++ b/fluxon_test_stack/tests/test_top_attention_cargo_tikv_contract.py @@ -0,0 +1,72 @@ +#!/usr/bin/env python3 + +from __future__ import annotations + +import importlib.util +import sys +import unittest +from pathlib import Path +from unittest import mock + + +REPO_ROOT = Path(__file__).resolve().parents[2] +INDEX_DIR = REPO_ROOT / "fluxon_test_stack" / "top_attention_test_index" + +MODULE_SPECS = { + "_cargo_fs.py": "fluxon_rs/fluxon_fs/Cargo.toml", + "_cargo_fs_s3_gateway.py": "fluxon_rs/fluxon_fs_s3_gateway/Cargo.toml", +} + + +def _load_module(module_name: str): + module_path = INDEX_DIR / module_name + module_dir = module_path.parent + sys.path.insert(0, str(module_dir)) + try: + spec = importlib.util.spec_from_file_location( + f"fluxon_test_stack_{module_path.stem}_contract", + module_path, + ) + assert spec is not None and spec.loader is not None + mod = importlib.util.module_from_spec(spec) + sys.modules[spec.name] = mod + spec.loader.exec_module(mod) + return mod + finally: + if sys.path and sys.path[0] == str(module_dir): + sys.path.pop(0) + + +class TestTopAttentionCargoTikvContract(unittest.TestCase): + def test_main_calls_cargo_test_for_expected_manifest(self) -> None: + for module_name, manifest_relpath in MODULE_SPECS.items(): + with self.subTest(module_name=module_name): + entry = _load_module(module_name) + module_path = INDEX_DIR / module_name + with mock.patch.object(entry, "run_cargo", return_value=0) as run_cargo: + with mock.patch.object(sys, "argv", [str(module_path)]): + rc = entry.main() + + self.assertEqual(rc, 0) + self.assertEqual( + run_cargo.call_args.args[0], + [ + "test", + "--manifest-path", + str(REPO_ROOT / manifest_relpath), + ], + ) + + def test_main_rejects_pytest_style_passthrough_flags(self) -> None: + for module_name in MODULE_SPECS: + with self.subTest(module_name=module_name): + entry = _load_module(module_name) + module_path = INDEX_DIR / module_name + with mock.patch.object(sys, "argv", [str(module_path), "-k", "lease"]): + with self.assertRaises(SystemExit) as cm: + entry.main() + self.assertEqual(cm.exception.code, 2) + + +if __name__ == "__main__": + raise SystemExit(unittest.main()) diff --git a/fluxon_test_stack/tests/test_top_attention_cargo_util_contract.py b/fluxon_test_stack/tests/test_top_attention_cargo_util_contract.py new file mode 100644 index 0000000..028bd8d --- /dev/null +++ b/fluxon_test_stack/tests/test_top_attention_cargo_util_contract.py @@ -0,0 +1,99 @@ +#!/usr/bin/env python3 + +from __future__ import annotations + +import importlib.util +import sys +import tempfile +import unittest +from pathlib import Path +from unittest import mock + +import yaml + + +REPO_ROOT = Path(__file__).resolve().parents[2] +MODULE_PATH = REPO_ROOT / "fluxon_test_stack" / "top_attention_test_index" / "_cargo_util.py" + + +def _load_module(): + module_dir = MODULE_PATH.parent + sys.path.insert(0, str(module_dir)) + try: + spec = importlib.util.spec_from_file_location("fluxon_test_stack_top_attention_cargo_util_contract", MODULE_PATH) + assert spec is not None and spec.loader is not None + mod = importlib.util.module_from_spec(spec) + sys.modules[spec.name] = mod + spec.loader.exec_module(mod) + return mod + finally: + if sys.path and sys.path[0] == str(module_dir): + sys.path.pop(0) + + +_ENTRY = _load_module() + + +class TestTopAttentionCargoUtilContract(unittest.TestCase): + def test_main_accepts_case_config_and_writes_build_config_ext(self) -> None: + with tempfile.TemporaryDirectory() as td: + run_dir = Path(td) + cfg_dir = run_dir / "configs" + cfg_dir.mkdir(parents=True) + src_dir = run_dir / "src" + src_dir.mkdir(parents=True) + case_cfg = cfg_dir / "ci_scene_config.yaml" + case_cfg.write_text( + yaml.safe_dump( + { + "case": { + "scene_id": "ci_top_attention_cargo_util", + "scale_id": "n1_kvowner_dram_20gib", + "profile_id": "fluxon_tcp", + "case_id": "ci_top_attention_cargo_util__n1_kvowner_dram_20gib__fluxon_tcp", + }, + "scene_config": {}, + "scene_runtime": { + "etcd": {"ip": "127.0.0.1", "port": 19180}, + "greptime": {"ip": "127.0.0.1", "port": 19190}, + }, + }, + sort_keys=False, + ), + encoding="utf-8", + ) + + with mock.patch.object(_ENTRY, "run_cargo", return_value=0) as run_cargo: + with mock.patch.object(sys, "argv", [str(MODULE_PATH), "--case-config", str(case_cfg)]): + rc = _ENTRY.main() + + self.assertEqual(rc, 0) + build_cfg = yaml.safe_load((src_dir / "build_config_ext.yml").read_text(encoding="utf-8")) + self.assertEqual( + build_cfg, + { + "etcd": "127.0.0.1:19180", + "prom": "http://127.0.0.1:19190/v1/prometheus", + "prom_remote_write_url": "http://127.0.0.1:19190/v1/prometheus/write", + }, + ) + self.assertEqual( + run_cargo.call_args.args[0], + [ + "test", + "--manifest-path", + str(REPO_ROOT / "fluxon_rs" / "fluxon_util" / "Cargo.toml"), + ], + ) + self.assertNotIn("env", run_cargo.call_args.kwargs) + + def test_main_rejects_pytest_style_passthrough_flags(self) -> None: + with mock.patch.object(sys, "argv", [str(MODULE_PATH), "-k", "lease"]): + with self.assertRaises(SystemExit) as cm: + _ENTRY.main() + + self.assertEqual(cm.exception.code, 2) + + +if __name__ == "__main__": + raise SystemExit(unittest.main()) diff --git a/fluxon_test_stack/tests/test_top_attention_cargo_workspace_contract.py b/fluxon_test_stack/tests/test_top_attention_cargo_workspace_contract.py new file mode 100644 index 0000000..66a600d --- /dev/null +++ b/fluxon_test_stack/tests/test_top_attention_cargo_workspace_contract.py @@ -0,0 +1,79 @@ +#!/usr/bin/env python3 + +from __future__ import annotations + +import importlib.util +import sys +import unittest +from pathlib import Path +from unittest import mock + + +REPO_ROOT = Path(__file__).resolve().parents[2] +INDEX_DIR = REPO_ROOT / "fluxon_test_stack" / "top_attention_test_index" + +MODULE_SPECS = { + "_cargo_cli.py": "fluxon_rs/fluxon_cli/Cargo.toml", + "_cargo_commu.py": "fluxon_rs/fluxon_commu/Cargo.toml", + "_cargo_commu_contract.py": "fluxon_rs/fluxon_commu_contract/Cargo.toml", + "_cargo_framework.py": "fluxon_rs/fluxon_framework/Cargo.toml", + "_cargo_limit_thirdparty.py": "fluxon_rs/limit_thirdparty/Cargo.toml", + "_cargo_mq.py": "fluxon_rs/fluxon_mq/Cargo.toml", + "_cargo_observability.py": "fluxon_rs/fluxon_observability/Cargo.toml", + "_cargo_ops.py": "fluxon_rs/fluxon_ops/Cargo.toml", + "_cargo_pyo3.py": "fluxon_rs/fluxon_pyo3/Cargo.toml", +} + + +def _load_module(module_name: str): + module_path = INDEX_DIR / module_name + module_dir = module_path.parent + sys.path.insert(0, str(module_dir)) + try: + spec = importlib.util.spec_from_file_location( + f"fluxon_test_stack_{module_path.stem}_contract", + module_path, + ) + assert spec is not None and spec.loader is not None + mod = importlib.util.module_from_spec(spec) + sys.modules[spec.name] = mod + spec.loader.exec_module(mod) + return mod + finally: + if sys.path and sys.path[0] == str(module_dir): + sys.path.pop(0) + + +class TestTopAttentionCargoWorkspaceContract(unittest.TestCase): + def test_main_calls_cargo_test_for_expected_manifest(self) -> None: + for module_name, manifest_relpath in MODULE_SPECS.items(): + with self.subTest(module_name=module_name): + entry = _load_module(module_name) + module_path = INDEX_DIR / module_name + with mock.patch.object(entry, "run_cargo", return_value=0) as run_cargo: + with mock.patch.object(sys, "argv", [str(module_path)]): + rc = entry.main() + + self.assertEqual(rc, 0) + self.assertEqual( + run_cargo.call_args.args[0], + [ + "test", + "--manifest-path", + str(REPO_ROOT / manifest_relpath), + ], + ) + + def test_main_rejects_pytest_style_passthrough_flags(self) -> None: + for module_name in MODULE_SPECS: + with self.subTest(module_name=module_name): + entry = _load_module(module_name) + module_path = INDEX_DIR / module_name + with mock.patch.object(sys, "argv", [str(module_path), "-k", "lease"]): + with self.assertRaises(SystemExit) as cm: + entry.main() + self.assertEqual(cm.exception.code, 2) + + +if __name__ == "__main__": + raise SystemExit(unittest.main()) diff --git a/fluxon_test_stack/tests/test_top_attention_common_contract.py b/fluxon_test_stack/tests/test_top_attention_common_contract.py index 924269d..f9e2daf 100644 --- a/fluxon_test_stack/tests/test_top_attention_common_contract.py +++ b/fluxon_test_stack/tests/test_top_attention_common_contract.py @@ -27,7 +27,22 @@ def _load_module(): class TestTopAttentionCommonContract(unittest.TestCase): - def test_prepare_cargo_env_prefers_active_fluxon_pyo3_libs_dir(self) -> None: + def test_prepare_cargo_env_preserves_parent_path_when_case_env_is_partial(self) -> None: + with mock.patch.object(_ENTRY, "_resolve_authoritative_fluxon_pyo3_libs_dir", return_value=None): + with mock.patch.object(_ENTRY, "_resolve_repo_closed_sdk_root", return_value=None): + with mock.patch.dict( + _ENTRY.os.environ, + {"PATH": "/usr/local/bin:/usr/bin:/bin", "HOME": "/tmp/fluxon-test-home"}, + clear=True, + ): + prepared_env = _ENTRY._prepare_cargo_env({"FLUXON_KV_TEST_ROUNDS": "p2p_only"}) + + assert prepared_env is not None + self.assertEqual(prepared_env["PATH"], "/usr/local/bin:/usr/bin:/bin") + self.assertEqual(prepared_env["HOME"], "/tmp/fluxon-test-home") + self.assertEqual(prepared_env["FLUXON_KV_TEST_ROUNDS"], "p2p_only") + + def test_prepare_cargo_env_prefers_active_fluxon_pyo3_libs_dir_and_sanitizes_loader_path(self) -> None: with tempfile.TemporaryDirectory() as td: root = Path(td) active_site_packages = root / "venv" / "lib" / "python3.12" / "site-packages" @@ -46,18 +61,79 @@ def test_prepare_cargo_env_prefers_active_fluxon_pyo3_libs_dir(self) -> None: ): with mock.patch.object(_ENTRY.site, "getsitepackages", return_value=[str(stale_libs_dir.parent)]): with mock.patch.object(_ENTRY.site, "getusersitepackages", return_value=""): - prepared_env = _ENTRY._prepare_cargo_env( - { - "LD_LIBRARY_PATH": f"{stale_libs_dir}:/usr/lib:/opt/custom", - "PATH": "/usr/bin", - } - ) + with mock.patch.object(_ENTRY, "_resolve_repo_closed_sdk_root", return_value=None): + prepared_env = _ENTRY._prepare_cargo_env( + { + "LD_LIBRARY_PATH": f"{stale_libs_dir}:/usr/lib:/opt/custom", + "PATH": "/usr/bin", + } + ) assert prepared_env is not None self.assertEqual(prepared_env["FLUXON_PYO3_LIBS_DIR"], str(active_libs_dir.resolve())) + self.assertEqual(prepared_env["LD_LIBRARY_PATH"], f"{active_libs_dir.resolve()}:/usr/lib:/opt/custom") + self.assertEqual(prepared_env["PATH"], "/usr/bin") + + def test_prepare_cargo_env_places_authoritative_fluxon_root_before_closed_sdk_runtime(self) -> None: + with tempfile.TemporaryDirectory() as td: + root = Path(td) + active_site_packages = root / "venv" / "lib" / "python3.12" / "site-packages" + active_libs_dir = active_site_packages / "fluxon_pyo3.libs" + active_libs_dir.mkdir(parents=True) + closed_sdk_root = root / "fluxon_release" / "closed_sdk" + (closed_sdk_root / "lib").mkdir(parents=True) + (closed_sdk_root / "manifest.json").write_text("{}", encoding="utf-8") + stale_libs_dir = root / "stale" / "site-packages" / "fluxon_pyo3.libs" + stale_libs_dir.mkdir(parents=True) + + with mock.patch.object( + _ENTRY.sysconfig, + "get_paths", + return_value={ + "platlib": str(active_site_packages), + "purelib": str(active_site_packages), + }, + ): + with mock.patch.object(_ENTRY.site, "getsitepackages", return_value=[str(stale_libs_dir.parent)]): + with mock.patch.object(_ENTRY.site, "getusersitepackages", return_value=""): + with mock.patch.object(_ENTRY, "REPO_ROOT", root): + prepared_env = _ENTRY._prepare_cargo_env( + { + "LD_LIBRARY_PATH": f"{stale_libs_dir}:/usr/lib:/opt/custom", + "PATH": "/usr/bin", + } + ) + + assert prepared_env is not None + self.assertEqual(prepared_env["FLUXON_PYO3_LIBS_DIR"], str(active_libs_dir.resolve())) + self.assertEqual(prepared_env["FLUXON_COMMU_CLOSED_SDK_ROOT"], str(closed_sdk_root.resolve())) + self.assertEqual( + prepared_env["LD_LIBRARY_PATH"], + f"{active_libs_dir.resolve()}:{(closed_sdk_root / 'lib').resolve()}:/usr/lib:/opt/custom", + ) + self.assertEqual(prepared_env["PATH"], "/usr/bin") + + def test_prepare_cargo_env_prepends_repo_closed_sdk_runtime(self) -> None: + with tempfile.TemporaryDirectory() as td: + root = Path(td) + closed_sdk_root = root / "fluxon_release" / "closed_sdk" + (closed_sdk_root / "lib").mkdir(parents=True) + (closed_sdk_root / "manifest.json").write_text("{}", encoding="utf-8") + + with mock.patch.object(_ENTRY, "REPO_ROOT", root): + with mock.patch.object(_ENTRY, "_resolve_authoritative_fluxon_pyo3_libs_dir", return_value=None): + prepared_env = _ENTRY._prepare_cargo_env( + { + "LD_LIBRARY_PATH": "/usr/lib:/opt/custom", + "PATH": "/usr/bin", + } + ) + + assert prepared_env is not None + self.assertEqual(prepared_env["FLUXON_COMMU_CLOSED_SDK_ROOT"], str(closed_sdk_root.resolve())) self.assertEqual( prepared_env["LD_LIBRARY_PATH"], - f"{active_libs_dir.resolve()}:/usr/lib:/opt/custom", + f"{(closed_sdk_root / 'lib').resolve()}:/usr/lib:/opt/custom", ) self.assertEqual(prepared_env["PATH"], "/usr/bin") diff --git a/fluxon_test_stack/tests/test_top_attention_log_mgmt_contract.py b/fluxon_test_stack/tests/test_top_attention_log_mgmt_contract.py index 2b92fd0..b38cd42 100644 --- a/fluxon_test_stack/tests/test_top_attention_log_mgmt_contract.py +++ b/fluxon_test_stack/tests/test_top_attention_log_mgmt_contract.py @@ -255,6 +255,8 @@ def test_run_cargo_does_not_forward_parent_passthrough(self) -> None: str(REPO_ROOT / "fluxon_rs" / "fluxon_util" / "Cargo.toml"), "--test", "log_mgmt", + "--", + "--test-threads=1", ], ) diff --git a/fluxon_test_stack/top_attention_test_index/README.md b/fluxon_test_stack/top_attention_test_index/README.md index e36b326..19069f2 100644 --- a/fluxon_test_stack/top_attention_test_index/README.md +++ b/fluxon_test_stack/top_attention_test_index/README.md @@ -49,9 +49,20 @@ Entries: - `_deployment_codegen.py`: deployment code generation coverage - `_log_mgmt.py`: shared-supervisor ops log rolling plus Rust KV log sharding coverage. `ci_test_list.yaml` now exposes this wrapper as the formal `ci_top_attention_log_mgmt` scene, and `test_runner.py` dispatches to it from the runner-native `top_attention` CI execution model. - `_script_tools.py`: script utility coverage -- `_cargo_fs_core.py`: cargo tests for the Rust FS core crate -- `_cargo_util.py`: cargo tests for the Rust util crate -- `_cargo_kv_unit.py`: cargo tests for the Rust KV crate +- `_cargo_fs_core.py`: cargo tests for the Rust FS core crate. `ci_test_list.yaml` now exposes this wrapper as the formal `ci_top_attention_cargo_fs_core` runner-native scene. +- `_cargo_util.py`: cargo tests for the Rust util crate. `ci_test_list.yaml` now exposes this wrapper as the formal `ci_top_attention_cargo_util` runner-native scene, with runtime endpoints supplied through canonical `--case-config`. +- `_cargo_kv_unit.py`: cargo tests for the Rust KV crate. `ci_test_list.yaml` now exposes this wrapper as the formal `ci_top_attention_cargo_kv_unit` runner-native scene, with transport feature selection sourced only from canonical `--case-config` (`scene_config.kv_transport_feature`). +- `_cargo_cli.py`: cargo tests for the Rust CLI crate +- `_cargo_commu.py`: cargo tests for the Rust communication facade crate +- `_cargo_commu_contract.py`: cargo tests for the Rust communication contract crate +- `_cargo_framework.py`: cargo tests for the Rust framework crate +- `_cargo_fs.py`: cargo tests for the Rust FS crate. This wrapper expects the prepared `fluxon_release/ext_images/tikv/*` runtime files. +- `_cargo_fs_s3_gateway.py`: cargo tests for the Rust FS S3 gateway crate. This wrapper expects the prepared `fluxon_release/ext_images/tikv/*` runtime files. +- `_cargo_limit_thirdparty.py`: cargo tests for the Rust third-party facade crate +- `_cargo_mq.py`: cargo tests for the Rust MQ crate +- `_cargo_observability.py`: cargo tests for the Rust observability crate +- `_cargo_ops.py`: cargo tests for the Rust ops crate +- `_cargo_pyo3.py`: cargo tests for the Rust PyO3 crate Operational note: @@ -60,6 +71,9 @@ Operational note: provide at least 308 common non-bastion deploy targets in `target_ip_map` for the default 300-producer/8-consumer topology; pass `--config` for the large cluster suite before running it. +- All `_cargo_*.py` wrappers are direct-process entrypoints. They do not forward + `pytest` selectors or `cargo test` passthrough flags unless the wrapper + explicitly defines that surface. Known gap: diff --git a/fluxon_test_stack/top_attention_test_index/_bin_kvtest.py b/fluxon_test_stack/top_attention_test_index/_bin_kvtest.py index faddb51..7c2c02a 100644 --- a/fluxon_test_stack/top_attention_test_index/_bin_kvtest.py +++ b/fluxon_test_stack/top_attention_test_index/_bin_kvtest.py @@ -5,9 +5,12 @@ import os from pathlib import Path -import yaml - -from _common import REPO_ROOT, load_case_config_payload, run_cargo +from _common import ( + REPO_ROOT, + load_case_config_payload, + run_cargo, + write_build_config_ext, +) TEST_REQUIREMENTS = ["cargo", "etcd", "ops", "submodules"] @@ -33,38 +36,6 @@ def _parse_kv_test_rounds(raw: object) -> str: return ",".join(rounds) -def _require_scene_runtime_endpoint(scene_runtime: object, *, service_id: str) -> tuple[str, int]: - if not isinstance(scene_runtime, dict): - raise ValueError("case config scene_runtime must be a mapping") - raw_service = scene_runtime.get(service_id) - if not isinstance(raw_service, dict): - raise ValueError(f"case config scene_runtime.{service_id} must be a mapping") - ip = str(raw_service.get("ip") or "").strip() - if not ip: - raise ValueError(f"case config scene_runtime.{service_id}.ip must be set") - port = raw_service.get("port") - if not isinstance(port, int): - raise ValueError(f"case config scene_runtime.{service_id}.port must be an int") - return ip, port - - -def _write_build_config_ext(case_cfg_path: Path, scene_runtime: dict) -> None: - etcd_ip, etcd_port = _require_scene_runtime_endpoint(scene_runtime, service_id="etcd") - greptime_ip, greptime_port = _require_scene_runtime_endpoint(scene_runtime, service_id="greptime") - out_path = case_cfg_path.resolve().parents[1] / "src" / "build_config_ext.yml" - out_path.write_text( - yaml.safe_dump( - { - "etcd": f"{etcd_ip}:{etcd_port}", - "prom": f"http://{greptime_ip}:{greptime_port}/v1/prometheus", - "prom_remote_write_url": f"http://{greptime_ip}:{greptime_port}/v1/prometheus/write", - }, - sort_keys=False, - ), - encoding="utf-8", - ) - - def main() -> int: parser = argparse.ArgumentParser( description="Flat index entry for the existing Rust kv_test binary." @@ -85,7 +56,7 @@ def main() -> int: scene_runtime = case_payload.get("scene_runtime") if not isinstance(scene_runtime, dict): raise ValueError("case config must define scene_runtime mapping") - _write_build_config_ext(case_cfg_path, scene_runtime) + write_build_config_ext(case_cfg_path, scene_runtime=scene_runtime) cargo_args = [ "run", @@ -99,9 +70,9 @@ def main() -> int: ] if passthrough: cargo_args.extend(["--", *passthrough]) - env = None + env: dict[str, str] | None = None if rounds != "all": - env = os.environ.copy() + env = {} env["FLUXON_KV_TEST_ROUNDS"] = rounds return run_cargo(cargo_args, env=env) diff --git a/fluxon_test_stack/top_attention_test_index/_cargo_cli.py b/fluxon_test_stack/top_attention_test_index/_cargo_cli.py new file mode 100644 index 0000000..cf56b5e --- /dev/null +++ b/fluxon_test_stack/top_attention_test_index/_cargo_cli.py @@ -0,0 +1,25 @@ +#!/usr/bin/env python3 +from __future__ import annotations + +import argparse + +from _common import REPO_ROOT, run_cargo + + +TEST_REQUIREMENTS = ["cargo", "ops", "submodules"] + + +def main() -> int: + parser = argparse.ArgumentParser( + description="Flat index entry for Rust CLI crate tests." + ) + parser.parse_args() + return run_cargo([ + "test", + "--manifest-path", + str(REPO_ROOT / "fluxon_rs" / "fluxon_cli" / "Cargo.toml"), + ]) + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/fluxon_test_stack/top_attention_test_index/_cargo_commu.py b/fluxon_test_stack/top_attention_test_index/_cargo_commu.py new file mode 100644 index 0000000..e1fd14c --- /dev/null +++ b/fluxon_test_stack/top_attention_test_index/_cargo_commu.py @@ -0,0 +1,25 @@ +#!/usr/bin/env python3 +from __future__ import annotations + +import argparse + +from _common import REPO_ROOT, run_cargo + + +TEST_REQUIREMENTS = ["cargo", "ops", "submodules"] + + +def main() -> int: + parser = argparse.ArgumentParser( + description="Flat index entry for Rust communication facade crate tests." + ) + parser.parse_args() + return run_cargo([ + "test", + "--manifest-path", + str(REPO_ROOT / "fluxon_rs" / "fluxon_commu" / "Cargo.toml"), + ]) + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/fluxon_test_stack/top_attention_test_index/_cargo_commu_contract.py b/fluxon_test_stack/top_attention_test_index/_cargo_commu_contract.py new file mode 100644 index 0000000..8e15c4f --- /dev/null +++ b/fluxon_test_stack/top_attention_test_index/_cargo_commu_contract.py @@ -0,0 +1,25 @@ +#!/usr/bin/env python3 +from __future__ import annotations + +import argparse + +from _common import REPO_ROOT, run_cargo + + +TEST_REQUIREMENTS = ["cargo", "ops", "submodules"] + + +def main() -> int: + parser = argparse.ArgumentParser( + description="Flat index entry for Rust communication contract crate tests." + ) + parser.parse_args() + return run_cargo([ + "test", + "--manifest-path", + str(REPO_ROOT / "fluxon_rs" / "fluxon_commu_contract" / "Cargo.toml"), + ]) + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/fluxon_test_stack/top_attention_test_index/_cargo_framework.py b/fluxon_test_stack/top_attention_test_index/_cargo_framework.py new file mode 100644 index 0000000..a6430de --- /dev/null +++ b/fluxon_test_stack/top_attention_test_index/_cargo_framework.py @@ -0,0 +1,25 @@ +#!/usr/bin/env python3 +from __future__ import annotations + +import argparse + +from _common import REPO_ROOT, run_cargo + + +TEST_REQUIREMENTS = ["cargo", "ops", "submodules"] + + +def main() -> int: + parser = argparse.ArgumentParser( + description="Flat index entry for Rust framework crate tests." + ) + parser.parse_args() + return run_cargo([ + "test", + "--manifest-path", + str(REPO_ROOT / "fluxon_rs" / "fluxon_framework" / "Cargo.toml"), + ]) + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/fluxon_test_stack/top_attention_test_index/_cargo_fs.py b/fluxon_test_stack/top_attention_test_index/_cargo_fs.py new file mode 100644 index 0000000..38eb589 --- /dev/null +++ b/fluxon_test_stack/top_attention_test_index/_cargo_fs.py @@ -0,0 +1,25 @@ +#!/usr/bin/env python3 +from __future__ import annotations + +import argparse + +from _common import REPO_ROOT, run_cargo + + +TEST_REQUIREMENTS = ["cargo", "fluxon-release", "ops", "submodules", "tikv"] + + +def main() -> int: + parser = argparse.ArgumentParser( + description="Flat index entry for Rust FS crate tests." + ) + parser.parse_args() + return run_cargo([ + "test", + "--manifest-path", + str(REPO_ROOT / "fluxon_rs" / "fluxon_fs" / "Cargo.toml"), + ]) + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/fluxon_test_stack/top_attention_test_index/_cargo_fs_core.py b/fluxon_test_stack/top_attention_test_index/_cargo_fs_core.py index cbca6f5..0af437c 100755 --- a/fluxon_test_stack/top_attention_test_index/_cargo_fs_core.py +++ b/fluxon_test_stack/top_attention_test_index/_cargo_fs_core.py @@ -1,6 +1,8 @@ #!/usr/bin/env python3 from __future__ import annotations +import argparse + from _common import REPO_ROOT, run_cargo @@ -8,6 +10,10 @@ def main() -> int: + parser = argparse.ArgumentParser( + description="Flat index entry for Rust FS core crate tests." + ) + parser.parse_args() return run_cargo([ "test", "--manifest-path", diff --git a/fluxon_test_stack/top_attention_test_index/_cargo_fs_s3_gateway.py b/fluxon_test_stack/top_attention_test_index/_cargo_fs_s3_gateway.py new file mode 100644 index 0000000..4b7d89e --- /dev/null +++ b/fluxon_test_stack/top_attention_test_index/_cargo_fs_s3_gateway.py @@ -0,0 +1,25 @@ +#!/usr/bin/env python3 +from __future__ import annotations + +import argparse + +from _common import REPO_ROOT, run_cargo + + +TEST_REQUIREMENTS = ["cargo", "fluxon-release", "ops", "submodules", "tikv"] + + +def main() -> int: + parser = argparse.ArgumentParser( + description="Flat index entry for Rust FS S3 gateway crate tests." + ) + parser.parse_args() + return run_cargo([ + "test", + "--manifest-path", + str(REPO_ROOT / "fluxon_rs" / "fluxon_fs_s3_gateway" / "Cargo.toml"), + ]) + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/fluxon_test_stack/top_attention_test_index/_cargo_kv_unit.py b/fluxon_test_stack/top_attention_test_index/_cargo_kv_unit.py index 36ae5ff..ba91e43 100755 --- a/fluxon_test_stack/top_attention_test_index/_cargo_kv_unit.py +++ b/fluxon_test_stack/top_attention_test_index/_cargo_kv_unit.py @@ -2,12 +2,18 @@ from __future__ import annotations import argparse -import os +from pathlib import Path -from _common import REPO_ROOT, run_cargo +from _common import ( + REPO_ROOT, + load_case_config_payload, + run_cargo, + write_build_config_ext, +) TEST_REQUIREMENTS = ["cargo", "etcd", "ops", "submodules"] +SCENE_ID = "ci_top_attention_cargo_kv_unit" def main() -> int: @@ -15,19 +21,28 @@ def main() -> int: description="Flat index entry for Rust KV crate unit tests." ) parser.add_argument( - "--feature", - default=os.environ.get("FLUXON_KV_TEST_TRANSPORT_FEATURE", "tcp_thread_transport"), - help="Transport feature appended to p2p_transfer.", + "--case-config", + required=True, + help="Canonical CI case config YAML emitted by test_runner.", ) - args, passthrough = parser.parse_known_args() + args = parser.parse_args() + case_cfg_path = Path(args.case_config).resolve() + case_payload = load_case_config_payload(case_cfg_path, expected_scene_id=SCENE_ID) + scene_config = case_payload["scene_config"] + feature = str(scene_config.get("kv_transport_feature") or "").strip() + if not feature: + raise ValueError("scene_config.kv_transport_feature must be set") + scene_runtime = case_payload.get("scene_runtime") + if not isinstance(scene_runtime, dict): + raise ValueError("case config must define scene_runtime mapping") + write_build_config_ext(case_cfg_path, scene_runtime=scene_runtime) return run_cargo([ "test", "--manifest-path", str(REPO_ROOT / "fluxon_rs" / "fluxon_kv" / "Cargo.toml"), "--no-default-features", "--features", - f"p2p_transfer,{args.feature}", - *passthrough, + f"p2p_transfer,{feature}", ]) diff --git a/fluxon_test_stack/top_attention_test_index/_cargo_limit_thirdparty.py b/fluxon_test_stack/top_attention_test_index/_cargo_limit_thirdparty.py new file mode 100644 index 0000000..1ef196b --- /dev/null +++ b/fluxon_test_stack/top_attention_test_index/_cargo_limit_thirdparty.py @@ -0,0 +1,25 @@ +#!/usr/bin/env python3 +from __future__ import annotations + +import argparse + +from _common import REPO_ROOT, run_cargo + + +TEST_REQUIREMENTS = ["cargo", "ops", "submodules"] + + +def main() -> int: + parser = argparse.ArgumentParser( + description="Flat index entry for Rust limit_thirdparty crate tests." + ) + parser.parse_args() + return run_cargo([ + "test", + "--manifest-path", + str(REPO_ROOT / "fluxon_rs" / "limit_thirdparty" / "Cargo.toml"), + ]) + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/fluxon_test_stack/top_attention_test_index/_cargo_mq.py b/fluxon_test_stack/top_attention_test_index/_cargo_mq.py new file mode 100644 index 0000000..aab3ff5 --- /dev/null +++ b/fluxon_test_stack/top_attention_test_index/_cargo_mq.py @@ -0,0 +1,25 @@ +#!/usr/bin/env python3 +from __future__ import annotations + +import argparse + +from _common import REPO_ROOT, run_cargo + + +TEST_REQUIREMENTS = ["cargo", "ops", "submodules"] + + +def main() -> int: + parser = argparse.ArgumentParser( + description="Flat index entry for Rust MQ crate tests." + ) + parser.parse_args() + return run_cargo([ + "test", + "--manifest-path", + str(REPO_ROOT / "fluxon_rs" / "fluxon_mq" / "Cargo.toml"), + ]) + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/fluxon_test_stack/top_attention_test_index/_cargo_observability.py b/fluxon_test_stack/top_attention_test_index/_cargo_observability.py new file mode 100644 index 0000000..5a5ee96 --- /dev/null +++ b/fluxon_test_stack/top_attention_test_index/_cargo_observability.py @@ -0,0 +1,25 @@ +#!/usr/bin/env python3 +from __future__ import annotations + +import argparse + +from _common import REPO_ROOT, run_cargo + + +TEST_REQUIREMENTS = ["cargo", "ops", "submodules"] + + +def main() -> int: + parser = argparse.ArgumentParser( + description="Flat index entry for Rust observability crate tests." + ) + parser.parse_args() + return run_cargo([ + "test", + "--manifest-path", + str(REPO_ROOT / "fluxon_rs" / "fluxon_observability" / "Cargo.toml"), + ]) + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/fluxon_test_stack/top_attention_test_index/_cargo_ops.py b/fluxon_test_stack/top_attention_test_index/_cargo_ops.py new file mode 100644 index 0000000..ffd1a11 --- /dev/null +++ b/fluxon_test_stack/top_attention_test_index/_cargo_ops.py @@ -0,0 +1,25 @@ +#!/usr/bin/env python3 +from __future__ import annotations + +import argparse + +from _common import REPO_ROOT, run_cargo + + +TEST_REQUIREMENTS = ["cargo", "ops", "submodules"] + + +def main() -> int: + parser = argparse.ArgumentParser( + description="Flat index entry for Rust ops crate tests." + ) + parser.parse_args() + return run_cargo([ + "test", + "--manifest-path", + str(REPO_ROOT / "fluxon_rs" / "fluxon_ops" / "Cargo.toml"), + ]) + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/fluxon_test_stack/top_attention_test_index/_cargo_pyo3.py b/fluxon_test_stack/top_attention_test_index/_cargo_pyo3.py new file mode 100644 index 0000000..4ee9a4c --- /dev/null +++ b/fluxon_test_stack/top_attention_test_index/_cargo_pyo3.py @@ -0,0 +1,25 @@ +#!/usr/bin/env python3 +from __future__ import annotations + +import argparse + +from _common import REPO_ROOT, run_cargo + + +TEST_REQUIREMENTS = ["cargo", "ops", "submodules"] + + +def main() -> int: + parser = argparse.ArgumentParser( + description="Flat index entry for Rust PyO3 crate tests." + ) + parser.parse_args() + return run_cargo([ + "test", + "--manifest-path", + str(REPO_ROOT / "fluxon_rs" / "fluxon_pyo3" / "Cargo.toml"), + ]) + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/fluxon_test_stack/top_attention_test_index/_cargo_util.py b/fluxon_test_stack/top_attention_test_index/_cargo_util.py index 2e707c8..e128221 100755 --- a/fluxon_test_stack/top_attention_test_index/_cargo_util.py +++ b/fluxon_test_stack/top_attention_test_index/_cargo_util.py @@ -1,13 +1,36 @@ #!/usr/bin/env python3 from __future__ import annotations -from _common import REPO_ROOT, run_cargo +import argparse +from pathlib import Path +from _common import ( + REPO_ROOT, + load_case_config_payload, + run_cargo, + write_build_config_ext, +) TEST_REQUIREMENTS = ["cargo", "etcd", "ops", "submodules"] +SCENE_ID = "ci_top_attention_cargo_util" def main() -> int: + parser = argparse.ArgumentParser( + description="Flat index entry for Rust util crate tests." + ) + parser.add_argument( + "--case-config", + help="Canonical CI case config YAML emitted by test_runner.", + ) + args = parser.parse_args() + if args.case_config: + case_cfg_path = Path(args.case_config).resolve() + case_payload = load_case_config_payload(case_cfg_path, expected_scene_id=SCENE_ID) + scene_runtime = case_payload.get("scene_runtime") + if not isinstance(scene_runtime, dict): + raise ValueError("case config must define scene_runtime mapping") + write_build_config_ext(case_cfg_path, scene_runtime=scene_runtime) return run_cargo([ "test", "--manifest-path", diff --git a/fluxon_test_stack/top_attention_test_index/_common.py b/fluxon_test_stack/top_attention_test_index/_common.py index c890584..1f2023c 100755 --- a/fluxon_test_stack/top_attention_test_index/_common.py +++ b/fluxon_test_stack/top_attention_test_index/_common.py @@ -52,18 +52,6 @@ def run_python_file( return call([python, "-u", str(REPO_ROOT / path), *extra_args]) -def run_python_files( - description: str, - paths: Iterable[str], -) -> int: - python, _ = parse_python_passthrough(description) - for path in paths: - rc = call([python, "-u", str(REPO_ROOT / path)]) - if rc != 0: - return rc - return 0 - - def load_case_config(path: str | Path, *, expected_scene_id: str) -> dict: cfg_path = Path(path).resolve() raw = yaml.safe_load(cfg_path.read_text(encoding="utf-8")) @@ -98,8 +86,39 @@ def load_case_config_payload(path: str | Path, *, expected_scene_id: str) -> dic return raw -def _path_contains_fluxon_pyo3_libs_dir(path: Path) -> bool: - return "fluxon_pyo3.libs" in path.parts +def _require_scene_runtime_endpoint(scene_runtime: object, *, service_id: str) -> tuple[str, int]: + if not isinstance(scene_runtime, dict): + raise ValueError("case config scene_runtime must be a mapping") + raw_service = scene_runtime.get(service_id) + if not isinstance(raw_service, dict): + raise ValueError(f"case config scene_runtime.{service_id} must be a mapping") + ip = str(raw_service.get("ip") or "").strip() + if not ip: + raise ValueError(f"case config scene_runtime.{service_id}.ip must be set") + port = raw_service.get("port") + if not isinstance(port, int): + raise ValueError(f"case config scene_runtime.{service_id}.port must be an int") + return ip, port + + +def write_build_config_ext(case_cfg_path: str | Path, *, scene_runtime: object) -> Path: + cfg_path = Path(case_cfg_path).resolve() + etcd_ip, etcd_port = _require_scene_runtime_endpoint(scene_runtime, service_id="etcd") + greptime_ip, greptime_port = _require_scene_runtime_endpoint(scene_runtime, service_id="greptime") + out_path = cfg_path.parents[1] / "src" / "build_config_ext.yml" + out_path.parent.mkdir(parents=True, exist_ok=True) + out_path.write_text( + yaml.safe_dump( + { + "etcd": f"{etcd_ip}:{etcd_port}", + "prom": f"http://{greptime_ip}:{greptime_port}/v1/prometheus", + "prom_remote_write_url": f"http://{greptime_ip}:{greptime_port}/v1/prometheus/write", + }, + sort_keys=False, + ), + encoding="utf-8", + ) + return out_path def _iter_active_python_site_packages_roots() -> list[Path]: @@ -136,30 +155,74 @@ def _resolve_authoritative_fluxon_pyo3_libs_dir() -> Path | None: return None -def _prepare_cargo_env(env: dict[str, str] | None) -> dict[str, str] | None: - libs_dir = _resolve_authoritative_fluxon_pyo3_libs_dir() - if libs_dir is None: - return None if env is None else dict(env) - - prepared_env = os.environ.copy() if env is None else dict(env) - authoritative_entry = str(libs_dir) - prepared_env["FLUXON_PYO3_LIBS_DIR"] = authoritative_entry - - sanitized_entries = [authoritative_entry] - seen_entries = {authoritative_entry} - current_ld_library_path = prepared_env.get("LD_LIBRARY_PATH") - if current_ld_library_path is not None: - for raw_entry in current_ld_library_path.split(":"): +def _path_contains_fluxon_pyo3_libs_dir(path: Path) -> bool: + return "fluxon_pyo3.libs" in path.parts + + +def _sanitize_cargo_ld_library_path( + *, + authoritative_entries: Sequence[str], + current_value: str | None, +) -> str: + # Keep the authoritative loader roots first, then retain only non-fluxon entries from the parent env. + sanitized_entries: list[str] = [] + seen_entries: set[str] = set() + for raw_entry in authoritative_entries: + entry = raw_entry.strip() + if not entry or entry in seen_entries: + continue + seen_entries.add(entry) + sanitized_entries.append(entry) + + if current_value is not None: + for raw_entry in current_value.split(":"): entry = raw_entry.strip() - if not entry: - continue - if entry in seen_entries: + if not entry or entry in seen_entries: continue if _path_contains_fluxon_pyo3_libs_dir(Path(entry)): continue seen_entries.add(entry) sanitized_entries.append(entry) - prepared_env["LD_LIBRARY_PATH"] = ":".join(sanitized_entries) + return ":".join(sanitized_entries) + + +def _resolve_repo_closed_sdk_root() -> Path | None: + closed_sdk_root = (REPO_ROOT / "fluxon_release" / "closed_sdk").resolve() + if not closed_sdk_root.is_dir(): + return None + manifest_path = closed_sdk_root / "manifest.json" + lib_dir = closed_sdk_root / "lib" + if not manifest_path.is_file() or not lib_dir.is_dir(): + return None + return closed_sdk_root + + +def _prepare_cargo_env(env: dict[str, str] | None) -> dict[str, str] | None: + libs_dir = _resolve_authoritative_fluxon_pyo3_libs_dir() + closed_sdk_root = _resolve_repo_closed_sdk_root() + if env is None and libs_dir is None and closed_sdk_root is None: + return None + + prepared_env = os.environ.copy() + if env is not None: + prepared_env.update(env) + authoritative_entries: list[str] = [] + + if libs_dir is not None: + authoritative_entry = str(libs_dir) + prepared_env["FLUXON_PYO3_LIBS_DIR"] = authoritative_entry + authoritative_entries.append(authoritative_entry) + + if closed_sdk_root is not None: + prepared_env["FLUXON_COMMU_CLOSED_SDK_ROOT"] = str(closed_sdk_root) + authoritative_entries.append(str((closed_sdk_root / "lib").resolve())) + + if authoritative_entries: + prepared_env["LD_LIBRARY_PATH"] = _sanitize_cargo_ld_library_path( + authoritative_entries=authoritative_entries, + current_value=prepared_env.get("LD_LIBRARY_PATH"), + ) + return prepared_env @@ -171,5 +234,8 @@ def run_cargo( ) -> int: # Rust test binaries launched via cargo run/load depend on the wheel-bundled native # runtime under the active venv. Keep one authoritative search root for all wrappers. + cargo_args = list(args) effective_passthrough = [] if passthrough is None else list(passthrough) - return call(["cargo", *args, *effective_passthrough], env=_prepare_cargo_env(env)) + if cargo_args and cargo_args[0] == "test": + effective_passthrough = ["--", "--test-threads=1", *effective_passthrough] + return call(["cargo", *cargo_args, *effective_passthrough], env=_prepare_cargo_env(env)) diff --git a/fluxon_test_stack/top_attention_test_index/_deployment_codegen.py b/fluxon_test_stack/top_attention_test_index/_deployment_codegen.py index 8bac6eb..725deff 100755 --- a/fluxon_test_stack/top_attention_test_index/_deployment_codegen.py +++ b/fluxon_test_stack/top_attention_test_index/_deployment_codegen.py @@ -1,23 +1,26 @@ #!/usr/bin/env python3 from __future__ import annotations -from _common import run_python_files +from _common import run_python_file TEST_REQUIREMENTS = ["ops"] +TEST_PATHS = [ + "deployment/tests/test_gen_bare_deploy_bash.py", + "deployment/tests/test_gen_k8s_daemonset.py", + "deployment/tests/test_selection_supervisor_codegen.py", + "deployment/tests/test_start_test_bed_bootstrap_log.py", + "deployment/tests/test_start_test_bed_deploy_payload.py", +] +DESCRIPTION = "Flat index entry for deployment codegen tests." def main() -> int: - return run_python_files( - "Flat index entry for deployment codegen tests.", - [ - "deployment/tests/test_gen_bare_deploy_bash.py", - "deployment/tests/test_gen_k8s_daemonset.py", - "deployment/tests/test_selection_supervisor_codegen.py", - "deployment/tests/test_start_test_bed_bootstrap_log.py", - "deployment/tests/test_start_test_bed_deploy_payload.py", - ], - ) + for path in TEST_PATHS: + rc = run_python_file(DESCRIPTION, path) + if rc != 0: + return rc + return 0 if __name__ == "__main__": diff --git a/fluxon_test_stack/top_attention_test_index/_py_runtime.py b/fluxon_test_stack/top_attention_test_index/_py_runtime.py index b9de962..9598ede 100755 --- a/fluxon_test_stack/top_attention_test_index/_py_runtime.py +++ b/fluxon_test_stack/top_attention_test_index/_py_runtime.py @@ -1,20 +1,23 @@ #!/usr/bin/env python3 from __future__ import annotations -from _common import run_python_files +from _common import run_python_file TEST_REQUIREMENTS = ["ops"] +TEST_PATHS = [ + "fluxon_py/tests/test_process_runner.py", + "fluxon_py/tests/test_backend_fallback_close.py", +] +DESCRIPTION = "Flat index entry for Python runtime/process tests." def main() -> int: - return run_python_files( - "Flat index entry for Python runtime/process tests.", - [ - "fluxon_py/tests/test_process_runner.py", - "fluxon_py/tests/test_backend_fallback_close.py", - ], - ) + for path in TEST_PATHS: + rc = run_python_file(DESCRIPTION, path) + if rc != 0: + return rc + return 0 if __name__ == "__main__": diff --git a/fluxon_test_stack/top_attention_test_index/_script_tools.py b/fluxon_test_stack/top_attention_test_index/_script_tools.py index 304b7b9..ad53d8b 100755 --- a/fluxon_test_stack/top_attention_test_index/_script_tools.py +++ b/fluxon_test_stack/top_attention_test_index/_script_tools.py @@ -1,22 +1,25 @@ #!/usr/bin/env python3 from __future__ import annotations -from _common import run_python_files +from _common import run_python_file TEST_REQUIREMENTS = ["ops"] +TEST_PATHS = [ + "setup_and_pack/tests/test_rclone_dist.py", + "setup_and_pack/tests/test_rclone_sequential.py", + "setup_and_pack/tests/test_roundrobin_buckets.py", + "setup_and_pack/tests/test_scan_dir_size_progress.py", +] +DESCRIPTION = "Flat index entry for script utility tests." def main() -> int: - return run_python_files( - "Flat index entry for script utility tests.", - [ - "setup_and_pack/tests/test_rclone_dist.py", - "setup_and_pack/tests/test_rclone_sequential.py", - "setup_and_pack/tests/test_roundrobin_buckets.py", - "setup_and_pack/tests/test_scan_dir_size_progress.py", - ], - ) + for path in TEST_PATHS: + rc = run_python_file(DESCRIPTION, path) + if rc != 0: + return rc + return 0 if __name__ == "__main__":