Skip to content

<feature>[vm]: support DELETE /vm-instances/metadata bulk cleanup#3882

Open
MatheMatrix wants to merge 1 commit intozsv_5.0.0from
sync/tao.gan/add-APICleanupAllVmInstanceMetadataMsg-ZSV-11867@@3
Open

<feature>[vm]: support DELETE /vm-instances/metadata bulk cleanup#3882
MatheMatrix wants to merge 1 commit intozsv_5.0.0from
sync/tao.gan/add-APICleanupAllVmInstanceMetadataMsg-ZSV-11867@@3

Conversation

@MatheMatrix
Copy link
Copy Markdown
Owner

Add APICleanupAllVmInstanceMetadata channel: API/Event/SDK,
internal CleanupAllVmMetadataOnPrimaryStorageMsg, and
LocalStorage/NFS PS handlers with cleanup-all path.
Existing single-VM cleanup channel is untouched.

Resolves: ZSV-11867

Change-Id: I66667361746274616d6879666a6479796f797174

sync from gitlab !9762

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

新增“清理所有虚拟机实例元数据”功能:增加 REST API、消息/回复、事件与 SDK 类型,并在 LocalStorage 与 NFS(含 KVM agent)实现主机级分发、并发/回退与失败聚合;补充中文文档与测试调用辅助。

Changes

Cohort / File(s) Summary
Header: 消息与回复、API 消息/事件
header/src/main/java/org/zstack/header/storage/primary/CleanupAllVmMetadataOnPrimaryStorageMsg.java, header/src/main/java/org/zstack/header/storage/primary/CleanupAllVmMetadataOnPrimaryStorageReply.java, header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataMsg.java, header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataEvent.java
新增主存清理消息/回复类与 API 请求/事件类型;消息携带 primaryStorageUuid/metadataDir,API 消息携带可选 primaryStorageUuids 参数,事件返回失败的主存 UUID 列表。
中文文档
header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataMsgDoc_zh_cn.groovy, header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataEventDoc_zh_cn.groovy
新增 DELETE /v1/vm-instances/metadata 的中文文档,描述请求参数、授权与响应字段(包含失败主存列表与 success/error)。
LocalStorage 后端
plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java, plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageHypervisorBackend.java, plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageKvmBackend.java
LocalStorageBase 增加对 CleanupAllVmMetadataOnPrimaryStorageMsg 的处理:查询已连接主机并并发分发到每个 host 的 backend,聚合各主机失败为 ErrorCode 列表并在回复中汇总;新增后端抽象 handler,并在 KVM backend 通过 agent HTTP 接口实现具体调用(命令/响应 DTO 与路径常量)。
NFS 主存 与 KVM 后端
plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java, .../NfsPrimaryStorageBackend.java, .../NfsPrimaryStorageKVMBackend.java, .../NfsPrimaryStorageKVMBackendCommands.java
NFS 主存新增本地消息处理,按已连接主机顺序回退重试;后端接口与 KVM backend 增加 cleanup 路径与 agent 命令/响应 DTO,并将 agent 结果映射回回复或 host-scoped 错误。
主存基类默认处理
storage/src/main/java/org/zstack/storage/primary/PrimaryStorageBase.java
在 PrimaryStorageBase 增加对 CleanupAllVmMetadataOnPrimaryStorageMsg 的默认处理,返回“操作不支持”的回复。
SDK 与 测试库
sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataAction.java, sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataResult.java, testlib/src/main/java/org/zstack/testlib/ApiHelper.groovy
新增 SDK Action/Result(支持同步/异步调用与错误封装),并在测试库添加 ApiHelper.cleanupAllVmInstanceMetadata 调用封装与可选 ApiPath 跟踪。

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as API层
    participant PS as PrimaryStorage
    participant Host as HostAgent
    participant FS as HostFS

    Client->>API: DELETE /vm-instances/metadata
    API->>PS: CleanupAllVmMetadataOnPrimaryStorageMsg
    PS->>PS: 查询已连接主机列表
    alt 无已连接主机
        PS-->>API: CleanupAllVmMetadataOnPrimaryStorageReply (error)
    else 有已连接主机
        par 并发/循环对每个已连接主机
            PS->>Host: CleanupAllVmMetadataCmd(metadataDir)
            Host->>FS: 删除 metadataDir 下 VM 元数据
            FS-->>Host: CleanupAllVmMetadataRsp / error
            Host-->>PS: host 级 成功或 带 host-scoped error
        and 聚合各主机结果(成功/失败)
        end
        alt 存在主机失败
            PS-->>API: CleanupAllVmMetadataOnPrimaryStorageReply (失败聚合)
        else
            PS-->>API: CleanupAllVmMetadataOnPrimaryStorageReply (成功)
        end
    end
    API-->>Client: APICleanupAllVmInstanceMetadataEvent (failedPrimaryStorageUuids)
Loading

估算代码审查工作量

🎯 4 (复杂) | ⏱️ ~60 分钟

诗歌

我是小兔子在代码园,
跳进元数据的草丛间,
API 一声,主机齐动念,
成功留空,失败记下串串,
兔耳微颤,日志里跳竿。

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed PR title clearly describes the main feature: adding bulk cleanup support for VM instance metadata via DELETE /vm-instances/metadata endpoint.
Description check ✅ Passed PR description is related to the changeset, explaining the addition of APICleanupAllVmInstanceMetadata channel and handlers for LocalStorage/NFS primary storage.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/tao.gan/add-APICleanupAllVmInstanceMetadataMsg-ZSV-11867@@3

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataResult.java (1)

5-30: failedPrimaryStorageUuids 改成带泛型的集合。

这里返回的是主存储 UUID 列表,原始 List 会把 SDK 暴露成未类型化集合,调用方后续只能靠约定来猜元素类型。建议改成 List<String>,并顺手给一个空默认值,避免响应里缺字段时让调用方处理 null

♻️ 建议修改
-    public java.util.List failedPrimaryStorageUuids;
-    public void setFailedPrimaryStorageUuids(java.util.List failedPrimaryStorageUuids) {
+    public java.util.List<String> failedPrimaryStorageUuids = new java.util.ArrayList<>();
+    public void setFailedPrimaryStorageUuids(java.util.List<String> failedPrimaryStorageUuids) {
         this.failedPrimaryStorageUuids = failedPrimaryStorageUuids;
     }
-    public java.util.List getFailedPrimaryStorageUuids() {
+    public java.util.List<String> getFailedPrimaryStorageUuids() {
         return this.failedPrimaryStorageUuids;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataResult.java`
around lines 5 - 30, Change the raw List field failedPrimaryStorageUuids in
class CleanupAllVmInstanceMetadataResult to a typed List<String>, initialize it
to an empty list (e.g. new java.util.ArrayList<>()) to avoid null responses, and
update the field declaration plus the setFailedPrimaryStorageUuids and
getFailedPrimaryStorageUuids method signatures to use java.util.List<String> so
callers get a typed collection.
testlib/src/main/java/org/zstack/testlib/ApiHelper.groovy (1)

5609-5633: 不要直接改这个自动生成的辅助类。

ApiHelper.groovy 是生成文件,把新方法手工加在这里很容易在下一次重新生成时丢失,也会让 SDK/testlib 的实现来源分叉。建议把这段逻辑放回生成器或模板,再统一生成。

Based on learnings: ApiHelper.groovy in testlib/src/main/java/org/zstack/testlib/ApiHelper.groovy is auto-generated and should not be manually modified or receive code change suggestions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@testlib/src/main/java/org/zstack/testlib/ApiHelper.groovy` around lines 5609
- 5633, This helper method cleanupAllVmInstanceMetadata was added to an
auto-generated file (ApiHelper.groovy) and must not be edited directly;
revert/remove the manual change from
testlib/src/main/java/org/zstack/testlib/ApiHelper.groovy and instead add the
intended logic into the code generator or template that emits ApiHelper (update
the generator/template that produces methods like cleanupAllVmInstanceMetadata
so the method is generated consistently), regenerate the ApiHelper output, and
run the testlib build to confirm the generated class contains the new method
without manual edits.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java`:
- Around line 3676-3688: The fail(ErrorCode) branch in the cleanup callback
(inside LocalStorageBase) only calls com.addError(errorCode) but does not
increment totalFailed, causing reply.setFailedCount(...) to be underestimated;
update the fail(ErrorCode) handler to also increment the AtomicInteger
totalFailed (the same counter used when successful failures are counted) before
calling com.done(), ensuring When the WhileDoneCompletion.done(ErrorCodeList)
runs it reports the correct failed count via
reply.setFailedCount(totalFailed.get()) and still preserves
com.addError(errorCode).

In
`@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java`:
- Around line 2095-2117: The current
handle(CleanupAllVmMetadataOnPrimaryStorageMsg) binds cleanup to a single online
host (connectedHosts.get(0)) and fails when no hosts are connected; change it to
be best-effort: when connectedHosts.isEmpty() do not set reply.error — set
reply.setCount(0), log a warning, and bus.reply(msg, reply) instead of failing;
otherwise avoid hard-binding to the first host by either deriving the backend
from primary/cluster metadata or iterating available connectedHosts and calling
getBackendByHostUuid(hostUuid) for each until one succeeds; ensure cleanup logic
(and calls to VmInstanceResourceMetadataManager.deleteVmResourceMetadata(...))
treats failures as non-fatal and aggregates success/failure counts into
CleanupAllVmMetadataOnPrimaryStorageReply rather than returning an error.

---

Nitpick comments:
In `@sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataResult.java`:
- Around line 5-30: Change the raw List field failedPrimaryStorageUuids in class
CleanupAllVmInstanceMetadataResult to a typed List<String>, initialize it to an
empty list (e.g. new java.util.ArrayList<>()) to avoid null responses, and
update the field declaration plus the setFailedPrimaryStorageUuids and
getFailedPrimaryStorageUuids method signatures to use java.util.List<String> so
callers get a typed collection.

In `@testlib/src/main/java/org/zstack/testlib/ApiHelper.groovy`:
- Around line 5609-5633: This helper method cleanupAllVmInstanceMetadata was
added to an auto-generated file (ApiHelper.groovy) and must not be edited
directly; revert/remove the manual change from
testlib/src/main/java/org/zstack/testlib/ApiHelper.groovy and instead add the
intended logic into the code generator or template that emits ApiHelper (update
the generator/template that produces methods like cleanupAllVmInstanceMetadata
so the method is generated consistently), regenerate the ApiHelper output, and
run the testlib build to confirm the generated class contains the new method
without manual edits.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 2d0df6dc-817a-40b8-87b3-3916926fcc24

📥 Commits

Reviewing files that changed from the base of the PR and between 4d74f1a and ebcd372.

⛔ Files ignored due to path filters (1)
  • conf/serviceConfig/vmInstance.xml is excluded by !**/*.xml
📒 Files selected for processing (17)
  • header/src/main/java/org/zstack/header/storage/primary/CleanupAllVmMetadataOnPrimaryStorageMsg.java
  • header/src/main/java/org/zstack/header/storage/primary/CleanupAllVmMetadataOnPrimaryStorageReply.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataEvent.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataEventDoc_zh_cn.groovy
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataMsg.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataMsgDoc_zh_cn.groovy
  • plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java
  • plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageHypervisorBackend.java
  • plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageKvmBackend.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageBackend.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackend.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackendCommands.java
  • sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataAction.java
  • sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataResult.java
  • storage/src/main/java/org/zstack/storage/primary/PrimaryStorageBase.java
  • testlib/src/main/java/org/zstack/testlib/ApiHelper.groovy

Comment on lines +2095 to +2117
@Override
protected void handle(CleanupAllVmMetadataOnPrimaryStorageMsg msg) {
CleanupAllVmMetadataOnPrimaryStorageReply reply = new CleanupAllVmMetadataOnPrimaryStorageReply();
List<HostInventory> connectedHosts = factory.getConnectedHostForOperation(getSelfInventory());
if (connectedHosts.isEmpty()) {
reply.setError(operr("no connected host found for NFS primary storage[uuid:%s]", self.getUuid()));
bus.reply(msg, reply);
return;
}
String hostUuid = connectedHosts.get(0).getUuid();
final NfsPrimaryStorageBackend backend = getBackendByHostUuid(hostUuid);
backend.handle(msg, hostUuid, new ReturnValueCompletion<CleanupAllVmMetadataOnPrimaryStorageReply>(msg) {
@Override
public void success(CleanupAllVmMetadataOnPrimaryStorageReply r) {
bus.reply(msg, r);
}

@Override
public void fail(ErrorCode errorCode) {
reply.setError(errorCode);
bus.reply(msg, reply);
}
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

不要把批量清理硬性绑定到在线 host。

当前实现要求 connectedHosts 非空,并且只取第一个 host 作为后端路由;这会让主存储在所有主机短暂离线时无法执行批量元数据清理,也让后端选择依赖列表顺序。既然这个 reply 已经有计数字段,更适合返回 0 计数并记录告警,或者改为从主存储/集群元信息推导后端后再执行。
Based on learnings, VmInstanceResourceMetadataManager.deleteVmResourceMetadata(...) failures are commonly treated as best-effort and do not block the primary operation.

🛠️ 建议的调整
         List<HostInventory> connectedHosts = factory.getConnectedHostForOperation(getSelfInventory());
         if (connectedHosts.isEmpty()) {
-            reply.setError(operr("no connected host found for NFS primary storage[uuid:%s]", self.getUuid()));
+            logger.warn(String.format(
+                    "no connected host found for NFS primary storage[uuid:%s] when cleaning VM metadata",
+                    self.getUuid()));
+            reply.setCleanedCount(0);
+            reply.setFailedCount(0);
             bus.reply(msg, reply);
             return;
         }
         String hostUuid = connectedHosts.get(0).getUuid();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java`
around lines 2095 - 2117, The current
handle(CleanupAllVmMetadataOnPrimaryStorageMsg) binds cleanup to a single online
host (connectedHosts.get(0)) and fails when no hosts are connected; change it to
be best-effort: when connectedHosts.isEmpty() do not set reply.error — set
reply.setCount(0), log a warning, and bus.reply(msg, reply) instead of failing;
otherwise avoid hard-binding to the first host by either deriving the backend
from primary/cluster metadata or iterating available connectedHosts and calling
getBackendByHostUuid(hostUuid) for each until one succeeds; ensure cleanup logic
(and calls to VmInstanceResourceMetadataManager.deleteVmResourceMetadata(...))
treats failures as non-fatal and aggregates success/failure counts into
CleanupAllVmMetadataOnPrimaryStorageReply rather than returning an error.

@zstack-robot-1
Copy link
Copy Markdown
Collaborator

Comment from yaohua.wu:

Review: MR !9762 — ZSV-11867

DELETE /vm-instances/metadata 批量清理接口的 base 仓改动(API/Event/SDK + LocalStorage/NFS PS handler + PrimaryStorageBase 默认实现)。

关联 MR

  • premium !13718 — MevocoVmManagerImpl 接管 API 消息、扇出到各 PS、聚合 totalCleaned/totalFailed;含管理员鉴权和 SharedBlock backend
  • zstack-utility !7001 — kvmagent 三个插件(localstorage/nfs/sharedblock)的 cleanup-all endpoint,以及 file_metadata_handler / lv_metadata_do_cleanup_all / cleanup_all 实现

整条调用链: Client → premium MevocoVmManagerImpl.handle(APICleanupAllVmInstanceMetadataMsg) → zstack CleanupAllVmMetadataOnPrimaryStorageMsg → 各 PS-base.handle → backend → kvmagent。Cmd/Rsp schema 与 HTTP 路径在三仓中已对齐。


🔴 Critical

# File:Line Issue Fix
1 plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java:3679-3687 fail(ErrorCode) 分支只 addError,没有累加 totalFailed。当某个 host 的 agent 调用网络失败/agent 崩溃时,该 host 的失败完全不计入 reply.failedCount。结合 premium 端的 reply 处理逻辑(见关联 MR),此 PS 的 cleaned/failed 计数会被整体丢弃,API 仅报"PS 失败 1 个",丢失所有 host 级数据 在 fail() 中先 totalFailed.incrementAndGet()com.addError(errorCode)
java<br>public void fail(ErrorCode errorCode) {<br> logger.warn(...);<br> totalFailed.incrementAndGet();<br> com.addError(errorCode);<br> com.done();<br>}<br> 同理建议在准备 backend 抛异常的 catch 分支也累加。CodeRabbit 已在 thread 722151 提示,作者点击 resolved 但 commit 内未修复

🟡 Warning

# File:Line Issue Fix
1 sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataResult.java:25-29 failedPrimaryStorageUuids 是 raw java.util.List,未类型化 改为 List<String> 并在字段处 = new ArrayList<>() 给空默认值,避免响应缺字段时调用方处理 null
2 plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java:2095-2117 NFS 仅取 connectedHosts.get(0);该 host 在 backend.handle 同步前掉线/隔离会失败。NFS 元数据在共享卷上,理论上任意 connected host 都可以做,应在该 host 失败后 fallback 到下一个 connectedHosts 做有限 fallback,或至少在错误信息中包含 hostUuid 便于排查
3 plugin/localstorage/.../LocalStorageBase.java:3631-3645 实现内联了 LocalStorageHostRefVO + HostVO 的 SQL 查询,没有复用本类已有的 getConnectedHostUuids() 等辅助方法。模式不一致,后续维护时容易和现有 host 状态过滤逻辑漂移 复用本类已有的 host 过滤工具方法(与 cleanupVmInstanceMetadata 路径保持一致)
4 header/src/main/java/org/zstack/header/storage/primary/CleanupAllVmMetadataOnPrimaryStorageReply.java reply 没有 failedPrimaryStorageUuids 字段,premium 侧只能用 r.getFailedCount() > 0 判断是否把 PS uuid 加入 failedPsUuids,丢失了具体失败的 host 列表 视产品需求决定是否在 reply 增加 List<String> failedHostUuids,premium 聚合时可一并透传到事件

🟢 Suggestion

# File:Line Issue Fix
1 testlib/src/main/java/org/zstack/testlib/ApiHelper.groovy:5609-5633 这是自动生成文件,手工新增的方法在下次重新生成时会被覆盖 改在 SDK 生成器/模板里新增,再统一生成
2 header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataMsgDoc_zh_cn.groovy:46 groovy doc 末尾缺 newline(No newline at end of file),与同目录其他 doc 文件风格不一致 末尾补 newline
3 plugin/localstorage/.../LocalStorageBase.java:3633 错误日志写"no connected host"但仅 logger.warn + 返回 0/0,调用方无从知道 PS 状态 考虑在 reply 中也带一个语义化的 hint(如 errorMessage),不阻塞 success 但便于上层运维感知

Cross-repo 一致性

✅ HTTP 路径 /{local,nfs,sharedblock}/vm/metadata/cleanup-all 与 zstack-utility 三个 plugin 的常量对齐
CleanupAllVmMetadataCmd.metadataDir 由 premium MevocoVmManagerImpl 通过 VmMetadataPathBuildExtensionPoint.buildMetadataDir(psUuid) 构造并透传
⚠️ premium 侧依赖 VmMetadataPathBuildExtensionPoint 注册:本 MR 在 LocalStorage/NFS 路径上似乎已有该扩展点(依据 cleanup-vm-metadata 路径推断),但 SharedBlock 是否注册需在 premium 仓中确认;未注册则 SharedBlock PS 在隐式模式被静默跳过、显式模式被计为 failed

Verdict

REVISION_REQUIRED — 1 条 Critical(host 级失败不计数)。其余 Warning/Suggestion 可一并处理。建议与 premium !13718 的 reply.error 处理逻辑修订一起回归。


🤖 Robot Reviewer

@MatheMatrix MatheMatrix force-pushed the sync/tao.gan/add-APICleanupAllVmInstanceMetadataMsg-ZSV-11867@@3 branch 2 times, most recently from 86e00cf to 274665c Compare April 29, 2026 09:40
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java (1)

2095-2131: ⚠️ Potential issue | 🟠 Major

不要把 bulk cleanup 当成硬失败。

connectedHosts.isEmpty() 直接返回 error 会让主存储在临时没有在线 host 时无法完成元数据清理;这类删除更适合 best-effort,至少应返回空结果并记录 warning。
另外,idx >= connectedHosts.size() 时改成泛化错误会丢失最后一个失败原因,排障信息太少。

🛠️ 建议调整
         List<HostInventory> connectedHosts = factory.getConnectedHostForOperation(getSelfInventory());
         if (connectedHosts.isEmpty()) {
-            reply.setError(operr("no connected host found for NFS primary storage[uuid:%s]", self.getUuid()));
+            logger.warn(String.format(
+                    "no connected host found for NFS primary storage[uuid:%s] when cleaning all vm metadata",
+                    self.getUuid()));
             bus.reply(msg, reply);
             return;
         }

Based on learnings, VmInstanceResourceMetadataManager.deleteVmResourceMetadata(...) failures are commonly treated as best-effort and do not block the primary operation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java`
around lines 2095 - 2131, The current cleanup logic treats "no connected host"
and exhausting hosts as hard failures:
handle(CleanupAllVmMetadataOnPrimaryStorageMsg) returns an error when
connectedHosts.isEmpty() and cleanupAllOnHostWithFallback sets a generic error
when idx >= connectedHosts.size(), losing the last error detail; change both to
best-effort: when connectedHosts.isEmpty() log a warning and reply with an
empty/successful CleanupAllVmMetadataOnPrimaryStorageReply (no error) so callers
aren’t blocked, and in cleanupAllOnHostWithFallback when idx >=
connectedHosts.size() set a warning and reply without a failure OR include the
last ErrorCode/message from the final fail() callback (capture it in a finalErr
variable) so you preserve diagnostic info instead of a generic operr; touch the
methods handle(CleanupAllVmMetadataOnPrimaryStorageMsg) and
cleanupAllOnHostWithFallback and the fail() override to implement these
behaviors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java`:
- Around line 3637-3644: The cleanup-all flow in LocalStorageBase only iterates
connected hosts (via getConnectedLocalStorageHostUuids) and treats "no connected
host" as zero failures, which undercounts failures and misses disconnected hosts
in failedCount/failedHostUuids; update the cleanup-all handling in
LocalStorageBase so it builds the host list from all known local-storage hosts
(or union of connected + disconnected), attempt/record cleanup per-host, and
when there are zero connected hosts still populate reply.failedCount and
reply.failedHostUuids with the disconnected hosts (and include self.getUuid in
the log message), ensuring reply.setCleanedCount, reply.setFailedCount and
reply.setFailedHostUuids are set before bus.reply(msg, reply); modify the code
paths around getConnectedLocalStorageHostUuids, reply, msg, and variables
failedCount/failedHostUuids to aggregate results for both connected and
disconnected hosts.

---

Duplicate comments:
In
`@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java`:
- Around line 2095-2131: The current cleanup logic treats "no connected host"
and exhausting hosts as hard failures:
handle(CleanupAllVmMetadataOnPrimaryStorageMsg) returns an error when
connectedHosts.isEmpty() and cleanupAllOnHostWithFallback sets a generic error
when idx >= connectedHosts.size(), losing the last error detail; change both to
best-effort: when connectedHosts.isEmpty() log a warning and reply with an
empty/successful CleanupAllVmMetadataOnPrimaryStorageReply (no error) so callers
aren’t blocked, and in cleanupAllOnHostWithFallback when idx >=
connectedHosts.size() set a warning and reply without a failure OR include the
last ErrorCode/message from the final fail() callback (capture it in a finalErr
variable) so you preserve diagnostic info instead of a generic operr; touch the
methods handle(CleanupAllVmMetadataOnPrimaryStorageMsg) and
cleanupAllOnHostWithFallback and the fail() override to implement these
behaviors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9cb708fc-ba57-49e8-8db8-c581cf975295

📥 Commits

Reviewing files that changed from the base of the PR and between ebcd372 and 86e00cf.

⛔ Files ignored due to path filters (1)
  • conf/serviceConfig/vmInstance.xml is excluded by !**/*.xml
📒 Files selected for processing (17)
  • header/src/main/java/org/zstack/header/storage/primary/CleanupAllVmMetadataOnPrimaryStorageMsg.java
  • header/src/main/java/org/zstack/header/storage/primary/CleanupAllVmMetadataOnPrimaryStorageReply.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataEvent.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataEventDoc_zh_cn.groovy
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataMsg.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataMsgDoc_zh_cn.groovy
  • plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java
  • plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageHypervisorBackend.java
  • plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageKvmBackend.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageBackend.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackend.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackendCommands.java
  • sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataAction.java
  • sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataResult.java
  • storage/src/main/java/org/zstack/storage/primary/PrimaryStorageBase.java
  • testlib/src/main/java/org/zstack/testlib/ApiHelper.groovy
✅ Files skipped from review due to trivial changes (1)
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataEventDoc_zh_cn.groovy
🚧 Files skipped from review as they are similar to previous changes (7)
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackendCommands.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataMsg.java
  • sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataResult.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataEvent.java
  • header/src/main/java/org/zstack/header/storage/primary/CleanupAllVmMetadataOnPrimaryStorageMsg.java
  • header/src/main/java/org/zstack/header/storage/primary/CleanupAllVmMetadataOnPrimaryStorageReply.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataMsgDoc_zh_cn.groovy

Comment on lines +3637 to +3644
List<String> connectedHostUuids = getConnectedLocalStorageHostUuids();
if (connectedHostUuids.isEmpty()) {
logger.warn(String.format("[MetadataCleanup] cleanAll: no connected host found for local ps[uuid:%s]", self.getUuid()));
reply.setCleanedCount(0);
reply.setFailedCount(0);
bus.reply(msg, reply);
return;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

cleanup-all 未覆盖断连主机,失败统计会被低估。

当前仅以 Connected 主机作为清理集合;当存在 Disconnected 主机时,failedCountfailedHostUuids 会遗漏,Line 3638 在“无连接主机”场景也会返回 0 失败,和“全量清理”语义不一致。

建议修复(示例)
-        List<String> connectedHostUuids = getConnectedLocalStorageHostUuids();
+        List<String> allHostUuids = getAllLocalStorageHostUuids();
+        List<String> connectedHostUuids = getConnectedLocalStorageHostUuids();
+        List<String> disconnectedHostUuids = new ArrayList<>(allHostUuids);
+        disconnectedHostUuids.removeAll(connectedHostUuids);

         if (connectedHostUuids.isEmpty()) {
             logger.warn(String.format("[MetadataCleanup] cleanAll: no connected host found for local ps[uuid:%s]", self.getUuid()));
             reply.setCleanedCount(0);
-            reply.setFailedCount(0);
+            reply.setFailedCount(disconnectedHostUuids.size());
+            reply.setFailedHostUuids(disconnectedHostUuids);
             bus.reply(msg, reply);
             return;
         }

         AtomicInteger totalCleaned = new AtomicInteger(0);
-        AtomicInteger totalFailed = new AtomicInteger(0);
-        List<String> failedHostUuids = Collections.synchronizedList(new ArrayList<>());
+        AtomicInteger totalFailed = new AtomicInteger(disconnectedHostUuids.size());
+        List<String> failedHostUuids = Collections.synchronizedList(new ArrayList<>(disconnectedHostUuids));
+    private List<String> getAllLocalStorageHostUuids() {
+        return SQL.New(
+                        "select h.hostUuid from LocalStorageHostRefVO h where h.primaryStorageUuid = :psUuid", String.class)
+                .param("psUuid", self.getUuid())
+                .list();
+    }

Also applies to: 3699-3708

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java`
around lines 3637 - 3644, The cleanup-all flow in LocalStorageBase only iterates
connected hosts (via getConnectedLocalStorageHostUuids) and treats "no connected
host" as zero failures, which undercounts failures and misses disconnected hosts
in failedCount/failedHostUuids; update the cleanup-all handling in
LocalStorageBase so it builds the host list from all known local-storage hosts
(or union of connected + disconnected), attempt/record cleanup per-host, and
when there are zero connected hosts still populate reply.failedCount and
reply.failedHostUuids with the disconnected hosts (and include self.getUuid in
the log message), ensuring reply.setCleanedCount, reply.setFailedCount and
reply.setFailedHostUuids are set before bus.reply(msg, reply); modify the code
paths around getConnectedLocalStorageHostUuids, reply, msg, and variables
failedCount/failedHostUuids to aggregate results for both connected and
disconnected hosts.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java (1)

3637-3644: ⚠️ Potential issue | 🟠 Major

cleanup-all 仍会漏记断连主机。

Line 3637 和 Line 3699 只枚举 Connected 主机,Line 3638 在“全部断连”时还直接返回 failedCount=0。只要该 local PS 上存在已挂载但断连的 host,最终 failedCountfailedHostUuids 就会偏小,和 “cleanup all” 的语义不一致。

💡 建议修复
+    private List<String> getAllLocalStorageHostUuids() {
+        return SQL.New(
+                        "select h.hostUuid from LocalStorageHostRefVO h where h.primaryStorageUuid = :psUuid", String.class)
+                .param("psUuid", self.getUuid())
+                .list();
+    }
+
     `@Override`
     protected void handle(final CleanupAllVmMetadataOnPrimaryStorageMsg msg) {
         CleanupAllVmMetadataOnPrimaryStorageReply reply = new CleanupAllVmMetadataOnPrimaryStorageReply();
 
-        List<String> connectedHostUuids = getConnectedLocalStorageHostUuids();
+        List<String> allHostUuids = getAllLocalStorageHostUuids();
+        List<String> connectedHostUuids = getConnectedLocalStorageHostUuids();
+        List<String> disconnectedHostUuids = new ArrayList<>(allHostUuids);
+        disconnectedHostUuids.removeAll(connectedHostUuids);
+
         if (connectedHostUuids.isEmpty()) {
             logger.warn(String.format("[MetadataCleanup] cleanAll: no connected host found for local ps[uuid:%s]", self.getUuid()));
             reply.setCleanedCount(0);
-            reply.setFailedCount(0);
+            reply.setFailedCount(disconnectedHostUuids.size());
+            reply.setFailedHostUuids(new ArrayList<>(disconnectedHostUuids));
             bus.reply(msg, reply);
             return;
         }
 
         AtomicInteger totalCleaned = new AtomicInteger(0);
-        AtomicInteger totalFailed = new AtomicInteger(0);
-        List<String> failedHostUuids = Collections.synchronizedList(new ArrayList<>());
+        AtomicInteger totalFailed = new AtomicInteger(disconnectedHostUuids.size());
+        List<String> failedHostUuids = Collections.synchronizedList(new ArrayList<>(disconnectedHostUuids));

Also applies to: 3699-3708

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java`
around lines 3637 - 3644, The current early-return when
getConnectedLocalStorageHostUuids() is empty causes disconnected hosts with
mounts to be ignored; instead of returning with
reply.setFailedCount(0)/reply.setCleanedCount(0) and bus.reply(msg, reply),
query for all hosts that have mounts on this local primary storage (not just
connected ones), treat any disconnected-but-mounted hosts as failures (populate
reply.setFailedHostUuids(...) and reply.setFailedCount(...)), continue to
attempt cleanup only on connected hosts returned by
getConnectedLocalStorageHostUuids(), and compute reply.setCleanedCount(...) from
successful cleanups; apply the same fix to the other identical block that
currently enumerates only connected hosts.
🧹 Nitpick comments (1)
plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageKvmBackend.java (1)

3982-3984: 建议在失败分支补充主机/主存上下文,提升排障效率。

当前直接 completion.fail(errorCode),在批量主机场景里不易定位失败来源。建议在此处包装错误,带上 hostUuid(可选再带 self.getUuid())。

♻️ 建议修改
             `@Override`
             public void fail(ErrorCode errorCode) {
-                completion.fail(errorCode);
+                completion.fail(operr("cleanup all vm metadata failed on host[uuid:%s], primaryStorage[uuid:%s]",
+                        hostUuid, self.getUuid()).causedBy(errorCode));
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageKvmBackend.java`
around lines 3982 - 3984, Wrap the error passed to completion.fail with
contextual info so failures include host and primary storage identifiers: locate
the anonymous Failure callback inside LocalStorageKvmBackend where
fail(ErrorCode errorCode) currently calls completion.fail(errorCode) and change
it to construct a new ErrorCode or augment the existing one with hostUuid (and
optionally self.getUuid()) in the error details/message before calling
completion.fail; ensure you use the same ErrorCode type/fields used elsewhere so
logs and callers receive the enriched error context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@header/src/main/java/org/zstack/header/storage/primary/CleanupAllVmMetadataOnPrimaryStorageReply.java`:
- Around line 33-34: The setter setFailedHostUuids currently allows assigning
null to the failedHostUuids field which breaks the non-null invariant; change
setFailedHostUuids(List<String> failedHostUuids) so it normalizes null to an
empty list and makes a defensive copy (e.g. this.failedHostUuids =
failedHostUuids == null ? Collections.emptyList() : new
ArrayList<>(failedHostUuids)) to ensure failedHostUuids is never null and cannot
be mutated externally; update imports if needed.

In
`@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackend.java`:
- Around line 2214-2228: 在 NfsPrimaryStorageKVMBackend 的 cleanup 返回处理里,当前只在
rsp.isSuccess() 为 false 时调用 completion.fail(...) 且未回填 failedHostUuids 与 host
上下文,导致聚合结果缺失定位信息;请在处理 KVMHostAsyncHttpCallReply->CleanupAllVmMetadataRsp 的分支中:当
rsp.isSuccess() 为 false 或 rsp.failedCount > 0 时,将当前 hostUuid(或封装的 host 信息)加入一个
failedHostUuids 列表并把该列表和 cleanedCount/failedCount 一并设置到
CleanupAllVmMetadataOnPrimaryStorageReply(或在 error 上下文中包含 hostUuid),然后用
completion.success(r) 返回聚合结果(或用 completion.fail(...) 同时附带包含 failedHostUuids
的错误信息),确保函数名/类标识为
CleanupAllVmMetadataRsp、CleanupAllVmMetadataOnPrimaryStorageReply 和
NfsPrimaryStorageKVMBackend 可用于定位修改点。

---

Duplicate comments:
In
`@plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java`:
- Around line 3637-3644: The current early-return when
getConnectedLocalStorageHostUuids() is empty causes disconnected hosts with
mounts to be ignored; instead of returning with
reply.setFailedCount(0)/reply.setCleanedCount(0) and bus.reply(msg, reply),
query for all hosts that have mounts on this local primary storage (not just
connected ones), treat any disconnected-but-mounted hosts as failures (populate
reply.setFailedHostUuids(...) and reply.setFailedCount(...)), continue to
attempt cleanup only on connected hosts returned by
getConnectedLocalStorageHostUuids(), and compute reply.setCleanedCount(...) from
successful cleanups; apply the same fix to the other identical block that
currently enumerates only connected hosts.

---

Nitpick comments:
In
`@plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageKvmBackend.java`:
- Around line 3982-3984: Wrap the error passed to completion.fail with
contextual info so failures include host and primary storage identifiers: locate
the anonymous Failure callback inside LocalStorageKvmBackend where
fail(ErrorCode errorCode) currently calls completion.fail(errorCode) and change
it to construct a new ErrorCode or augment the existing one with hostUuid (and
optionally self.getUuid()) in the error details/message before calling
completion.fail; ensure you use the same ErrorCode type/fields used elsewhere so
logs and callers receive the enriched error context.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 2a3ae605-6aa3-4fdb-9c5c-5134a076677c

📥 Commits

Reviewing files that changed from the base of the PR and between 86e00cf and 274665c.

⛔ Files ignored due to path filters (1)
  • conf/serviceConfig/vmInstance.xml is excluded by !**/*.xml
📒 Files selected for processing (17)
  • header/src/main/java/org/zstack/header/storage/primary/CleanupAllVmMetadataOnPrimaryStorageMsg.java
  • header/src/main/java/org/zstack/header/storage/primary/CleanupAllVmMetadataOnPrimaryStorageReply.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataEvent.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataEventDoc_zh_cn.groovy
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataMsg.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataMsgDoc_zh_cn.groovy
  • plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java
  • plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageHypervisorBackend.java
  • plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageKvmBackend.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageBackend.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackend.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackendCommands.java
  • sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataAction.java
  • sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataResult.java
  • storage/src/main/java/org/zstack/storage/primary/PrimaryStorageBase.java
  • testlib/src/main/java/org/zstack/testlib/ApiHelper.groovy
✅ Files skipped from review due to trivial changes (2)
  • sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataResult.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataEventDoc_zh_cn.groovy
🚧 Files skipped from review as they are similar to previous changes (5)
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageBackend.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataEvent.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackendCommands.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataMsgDoc_zh_cn.groovy

Comment on lines +33 to +34
public void setFailedHostUuids(List<String> failedHostUuids) {
this.failedHostUuids = failedHostUuids;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

failedHostUuids 建议保持非空语义,避免后续空指针分支扩散。

当前 setter 可把字段置为 null,会削弱该 reply 的可用性约定。

🔧 建议修复
 public void setFailedHostUuids(List<String> failedHostUuids) {
-    this.failedHostUuids = failedHostUuids;
+    this.failedHostUuids = failedHostUuids == null ? new ArrayList<>() : failedHostUuids;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@header/src/main/java/org/zstack/header/storage/primary/CleanupAllVmMetadataOnPrimaryStorageReply.java`
around lines 33 - 34, The setter setFailedHostUuids currently allows assigning
null to the failedHostUuids field which breaks the non-null invariant; change
setFailedHostUuids(List<String> failedHostUuids) so it normalizes null to an
empty list and makes a defensive copy (e.g. this.failedHostUuids =
failedHostUuids == null ? Collections.emptyList() : new
ArrayList<>(failedHostUuids)) to ensure failedHostUuids is never null and cannot
be mutated externally; update imports if needed.

Comment on lines +2214 to +2228
if (!reply.isSuccess()) {
completion.fail(reply.getError());
return;
}

CleanupAllVmMetadataRsp rsp = ((KVMHostAsyncHttpCallReply) reply).toResponse(CleanupAllVmMetadataRsp.class);
if (!rsp.isSuccess()) {
completion.fail(operr("failed to cleanup all vm metadata on nfs via host[uuid:%s]: %s", hostUuid, rsp.getError()));
return;
}

CleanupAllVmMetadataOnPrimaryStorageReply r = new CleanupAllVmMetadataOnPrimaryStorageReply();
r.setCleanedCount(rsp.cleanedCount);
r.setFailedCount(rsp.failedCount);
completion.success(r);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

建议补全失败上下文与 failedHostUuids,避免聚合结果丢失定位信息。

当前失败时缺少 host 包装信息,且 failedCount>0 时未回填 host 维度,排障信息不完整。

🔧 建议修复
                 if (!reply.isSuccess()) {
-                    completion.fail(reply.getError());
+                    completion.fail(operr("failed to cleanup all vm metadata on nfs via host[uuid:%s]: %s",
+                            hostUuid, reply.getError()));
                     return;
                 }
@@
                 CleanupAllVmMetadataOnPrimaryStorageReply r = new CleanupAllVmMetadataOnPrimaryStorageReply();
                 r.setCleanedCount(rsp.cleanedCount);
                 r.setFailedCount(rsp.failedCount);
+                if (rsp.failedCount > 0) {
+                    r.setFailedHostUuids(Collections.singletonList(hostUuid));
+                }
                 completion.success(r);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackend.java`
around lines 2214 - 2228, 在 NfsPrimaryStorageKVMBackend 的 cleanup 返回处理里,当前只在
rsp.isSuccess() 为 false 时调用 completion.fail(...) 且未回填 failedHostUuids 与 host
上下文,导致聚合结果缺失定位信息;请在处理 KVMHostAsyncHttpCallReply->CleanupAllVmMetadataRsp 的分支中:当
rsp.isSuccess() 为 false 或 rsp.failedCount > 0 时,将当前 hostUuid(或封装的 host 信息)加入一个
failedHostUuids 列表并把该列表和 cleanedCount/failedCount 一并设置到
CleanupAllVmMetadataOnPrimaryStorageReply(或在 error 上下文中包含 hostUuid),然后用
completion.success(r) 返回聚合结果(或用 completion.fail(...) 同时附带包含 failedHostUuids
的错误信息),确保函数名/类标识为
CleanupAllVmMetadataRsp、CleanupAllVmMetadataOnPrimaryStorageReply 和
NfsPrimaryStorageKVMBackend 可用于定位修改点。

@MatheMatrix MatheMatrix force-pushed the sync/tao.gan/add-APICleanupAllVmInstanceMetadataMsg-ZSV-11867@@3 branch from 274665c to 8dab03c Compare April 29, 2026 10:22
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java (2)

3635-3677: ⚠️ Potential issue | 🔴 Critical

批量清理结果没有回填计数,聚合结果会一直失真。

当前 reply 从创建到 bus.reply(...) 之间没有记录任何 cleaned/failed 统计;而 catch 分支和 fail(...) 分支也只是 addError,没有累加失败数。结果就是:即使部分 host 已成功清理、部分 host 已失败,上游拿到的仍然会是默认计数,最终总清理数/失败数都会不准。

建议修复
     protected void handle(final CleanupAllVmMetadataOnPrimaryStorageMsg msg) {
         CleanupAllVmMetadataOnPrimaryStorageReply reply = new CleanupAllVmMetadataOnPrimaryStorageReply();
+        AtomicInteger totalCleaned = new AtomicInteger(0);
+        AtomicInteger totalFailed = new AtomicInteger(0);

         List<String> connectedHostUuids = getConnectedLocalStorageHostUuids();
         if (connectedHostUuids.isEmpty()) {
             logger.warn(String.format("[MetadataCleanup] cleanAll: no connected host found for local ps[uuid:%s]", self.getUuid()));
+            reply.setCleanedCount(0);
+            reply.setFailedCount(0);
             bus.reply(msg, reply);
             return;
         }

         new While<>(connectedHostUuids).all((hostUuid, com) -> {
             final LocalStorageHypervisorBackend bkd;
             try {
                 LocalStorageHypervisorFactory f = getHypervisorBackendFactoryByHostUuid(hostUuid);
                 bkd = f.getHypervisorBackend(self);
             } catch (Exception e) {
                 logger.warn(String.format("[MetadataCleanup] cleanAll: failed to prepare backend for host[uuid:%s] on ps[uuid:%s]: %s",
                         hostUuid, self.getUuid(), e.getMessage()));
+                totalFailed.incrementAndGet();
                 com.addError(operr("host[uuid:%s] backend prepare failed: %s", hostUuid, e.getMessage()));
                 com.done();
                 return;
             }
             bkd.handle(msg, hostUuid, new ReturnValueCompletion<CleanupAllVmMetadataOnPrimaryStorageReply>(com) {
                 `@Override`
                 public void success(CleanupAllVmMetadataOnPrimaryStorageReply returnValue) {
+                    totalCleaned.addAndGet(returnValue.getCleanedCount());
+                    totalFailed.addAndGet(returnValue.getFailedCount());
                     com.done();
                 }

                 `@Override`
                 public void fail(ErrorCode errorCode) {
                     logger.warn(String.format("[MetadataCleanup] cleanAll: failed on host[uuid:%s] on ps[uuid:%s]: %s",
                             hostUuid, self.getUuid(), errorCode));
+                    totalFailed.incrementAndGet();
                     com.addError(operr("host[uuid:%s]: %s", hostUuid, errorCode.getDescription()));
                     com.done();
                 }
             });
         }).run(new WhileDoneCompletion(msg) {
             `@Override`
             public void done(ErrorCodeList errorCodeList) {
+                reply.setCleanedCount(totalCleaned.get());
+                reply.setFailedCount(totalFailed.get());
                 if (!errorCodeList.getCauses().isEmpty()) {
                     reply.setError(operr("local primary storage[uuid:%s] cleanAll failed on %d/%d host(s): %s",
                             self.getUuid(), errorCodeList.getCauses().size(), connectedHostUuids.size(), errorCodeList));
                 }
                 bus.reply(msg, reply);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java`
around lines 3635 - 3677, The reply never accumulates per-host success/failure
counts; update the loop around getConnectedLocalStorageHostUuids/While and the
LocalStorageHypervisorBackend.handle callbacks to increment counters and set
them on CleanupAllVmMetadataOnPrimaryStorageReply before bus.reply: create
atomic/volatile counters (e.g., int successCount/failCount or AtomicInteger)
outside the While, increment successCount in ReturnValueCompletion.success and
failCount in both the backend-prepare catch block and ReturnValueCompletion.fail
(also include error messages already added), and finally set
reply.setSuccessCount(successCount) and reply.setFailCount(failCount) (or the
reply fields that represent cleaned/failed counts) in the
WhileDoneCompletion.done before calling bus.reply(msg, reply).

3637-3641: ⚠️ Potential issue | 🟠 Major

cleanup-all 仍然只看 Connected 主机,断连主机会被静默漏掉。

Line 3637 只取 Connected 主机,Line 3638-3641 在“没有连接主机”时直接成功返回。这样一来,只要本地存储上还有已挂载但断连的 host,调用方就会得到一个“0 失败”的结果,和 bulk cleanup 的语义不一致。这里建议像本类的扫描路径一样,先拿到全部 host,再把断连 host 预先记为失败。

建议修复
-        List<String> connectedHostUuids = getConnectedLocalStorageHostUuids();
-        if (connectedHostUuids.isEmpty()) {
-            logger.warn(String.format("[MetadataCleanup] cleanAll: no connected host found for local ps[uuid:%s]", self.getUuid()));
-            bus.reply(msg, reply);
-            return;
-        }
+        List<String> allHostUuids = getAllLocalStorageHostUuids();
+        List<String> connectedHostUuids = getConnectedLocalStorageHostUuids();
+        int disconnectedHostCount = allHostUuids.size() - connectedHostUuids.size();
+
+        if (connectedHostUuids.isEmpty()) {
+            logger.warn(String.format("[MetadataCleanup] cleanAll: no connected host found for local ps[uuid:%s], total hosts=%d",
+                    self.getUuid(), allHostUuids.size()));
+            reply.setFailedCount(disconnectedHostCount);
+            bus.reply(msg, reply);
+            return;
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java`
around lines 3637 - 3641, The current cleanup-all logic uses
getConnectedLocalStorageHostUuids() and returns early when that list is empty
(logger.warn + bus.reply(msg, reply)), which silently omits
disconnected-but-attached hosts; change it to first collect the full set of
hosts attached to this local primary storage (use the class's existing full-host
scan helper — e.g. a method like
getAllLocalStorageHostUuids()/getLocalStorageHostUuidsForPrimary or similar),
compute connected = getConnectedLocalStorageHostUuids(), then for every host in
the full set that is not in connected add a failure entry to the reply result
(marking those hosts as cleanup-failed) instead of returning early; continue
processing connected hosts as before and send bus.reply(msg, reply) only after
building failures for disconnected hosts and results for processed connected
hosts.
🧹 Nitpick comments (3)
testlib/src/main/java/org/zstack/testlib/ApiHelper.groovy (1)

5609-5633: 请把该新增 helper 的实现迁移到 SDK 生成器/模板,而不是直接修改 ApiHelper.groovy

该文件是生成产物,直接改这里容易在下次生成时被覆盖,并引入维护漂移;建议改模板后重新生成 testlib。

Based on learnings: testlib/src/main/java/org/zstack/testlib/ApiHelper.groovy 是自动生成文件,不应手工修改。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@testlib/src/main/java/org/zstack/testlib/ApiHelper.groovy` around lines 5609
- 5633, 在 ApiHelper.groovy 中新增的 cleanupAllVmInstanceMetadata
实现不应直接修改生成产物;请把该实现迁移到 SDK 生成器/模板中(更新产生 ApiHelper 模板以包含
cleanupAllVmInstanceMetadata 的逻辑,包括 a.sessionId =
Test.currentEnvSpec?.session?.uuid、a.apiId 赋值、ApiPathTracker 包裹调用以及使用 errorOut
的返回值),然后重新运行生成流程以产生 testlib,确保不再在
testlib/src/main/java/org/zstack/testlib/ApiHelper.groovy 做手工改动以避免被覆盖和维护漂移。
header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataEvent.java (1)

10-10: 建议把 failedPrimaryStorageUuids 默认初始化为空列表。

这里默认是 null,无失败场景下响应体可能在 null[] 之间摇摆,调用方就得额外判空。直接给一个空列表默认值,API/SDK 的返回形状会更稳定。

可选修改
+import java.util.ArrayList;
 import java.util.List;
 ...
 public class APICleanupAllVmInstanceMetadataEvent extends APIEvent {
-    private List<String> failedPrimaryStorageUuids;
+    private List<String> failedPrimaryStorageUuids = new ArrayList<>();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataEvent.java`
at line 10, The field failedPrimaryStorageUuids in class
APICleanupAllVmInstanceMetadataEvent should be default-initialized to an empty
list instead of null; update the declaration of failedPrimaryStorageUuids to
assign an empty List (e.g., Collections.emptyList() or new ArrayList<>()) so
responses always return [] when there are no failures, and add the necessary
import for Collections or ArrayList if required.
sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataAction.java (1)

28-35: 请给这些列表参数补上泛型。

primaryStorageUuidssystemTagsuserTags 现在都是原始 List,调用方可以在编译期塞入任意元素类型,问题会被推迟到序列化或服务端校验时才暴露。这里更适合声明成 List<String>;如果这是生成代码,建议一并修正 SDK 生成模板。

可选修改
-    public java.util.List primaryStorageUuids;
+    public java.util.List<String> primaryStorageUuids;
 ...
-    public java.util.List systemTags;
+    public java.util.List<String> systemTags;
 ...
-    public java.util.List userTags;
+    public java.util.List<String> userTags;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataAction.java`
around lines 28 - 35, The three fields in
CleanupAllVmInstanceMetadataAction—primaryStorageUuids, systemTags, and
userTags—are declared as raw java.util.List and should be changed to use
generics (List<String>) to enforce compile-time type safety; update their
declarations to public java.util.List<String> primaryStorageUuids, public
java.util.List<String> systemTags, and public java.util.List<String> userTags,
add or adjust any necessary imports (java.util.List) and update any places that
construct or reference these fields to use List<String>; if this file is
generated, also fix the SDK codegen template that emits these fields so future
generations produce List<String> instead of raw List.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java`:
- Around line 2110-2113: The final failure reply in NfsPrimaryStorage (the block
that checks if idx >= connectedHosts.size() and calls
reply.setError(operr(...))) omits the root cause; capture the last
ErrorCode/Exception from the per-host retry loop (e.g., store it in a variable
like lastError or lastErr when individual attempts fail) and include its details
in the final reply.setError message (augment operr(...) with the
lastError.getDetails()/toString()). Apply the same change to the analogous block
around 2125-2129 so the reply contains the most recent failure information for
observability.

---

Duplicate comments:
In
`@plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java`:
- Around line 3635-3677: The reply never accumulates per-host success/failure
counts; update the loop around getConnectedLocalStorageHostUuids/While and the
LocalStorageHypervisorBackend.handle callbacks to increment counters and set
them on CleanupAllVmMetadataOnPrimaryStorageReply before bus.reply: create
atomic/volatile counters (e.g., int successCount/failCount or AtomicInteger)
outside the While, increment successCount in ReturnValueCompletion.success and
failCount in both the backend-prepare catch block and ReturnValueCompletion.fail
(also include error messages already added), and finally set
reply.setSuccessCount(successCount) and reply.setFailCount(failCount) (or the
reply fields that represent cleaned/failed counts) in the
WhileDoneCompletion.done before calling bus.reply(msg, reply).
- Around line 3637-3641: The current cleanup-all logic uses
getConnectedLocalStorageHostUuids() and returns early when that list is empty
(logger.warn + bus.reply(msg, reply)), which silently omits
disconnected-but-attached hosts; change it to first collect the full set of
hosts attached to this local primary storage (use the class's existing full-host
scan helper — e.g. a method like
getAllLocalStorageHostUuids()/getLocalStorageHostUuidsForPrimary or similar),
compute connected = getConnectedLocalStorageHostUuids(), then for every host in
the full set that is not in connected add a failure entry to the reply result
(marking those hosts as cleanup-failed) instead of returning early; continue
processing connected hosts as before and send bus.reply(msg, reply) only after
building failures for disconnected hosts and results for processed connected
hosts.

---

Nitpick comments:
In
`@header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataEvent.java`:
- Line 10: The field failedPrimaryStorageUuids in class
APICleanupAllVmInstanceMetadataEvent should be default-initialized to an empty
list instead of null; update the declaration of failedPrimaryStorageUuids to
assign an empty List (e.g., Collections.emptyList() or new ArrayList<>()) so
responses always return [] when there are no failures, and add the necessary
import for Collections or ArrayList if required.

In `@sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataAction.java`:
- Around line 28-35: The three fields in
CleanupAllVmInstanceMetadataAction—primaryStorageUuids, systemTags, and
userTags—are declared as raw java.util.List and should be changed to use
generics (List<String>) to enforce compile-time type safety; update their
declarations to public java.util.List<String> primaryStorageUuids, public
java.util.List<String> systemTags, and public java.util.List<String> userTags,
add or adjust any necessary imports (java.util.List) and update any places that
construct or reference these fields to use List<String>; if this file is
generated, also fix the SDK codegen template that emits these fields so future
generations produce List<String> instead of raw List.

In `@testlib/src/main/java/org/zstack/testlib/ApiHelper.groovy`:
- Around line 5609-5633: 在 ApiHelper.groovy 中新增的 cleanupAllVmInstanceMetadata
实现不应直接修改生成产物;请把该实现迁移到 SDK 生成器/模板中(更新产生 ApiHelper 模板以包含
cleanupAllVmInstanceMetadata 的逻辑,包括 a.sessionId =
Test.currentEnvSpec?.session?.uuid、a.apiId 赋值、ApiPathTracker 包裹调用以及使用 errorOut
的返回值),然后重新运行生成流程以产生 testlib,确保不再在
testlib/src/main/java/org/zstack/testlib/ApiHelper.groovy 做手工改动以避免被覆盖和维护漂移。
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 61a70adc-b9cd-4b7b-843e-b7899dca0e42

📥 Commits

Reviewing files that changed from the base of the PR and between 274665c and 8dab03c.

⛔ Files ignored due to path filters (1)
  • conf/serviceConfig/vmInstance.xml is excluded by !**/*.xml
📒 Files selected for processing (17)
  • header/src/main/java/org/zstack/header/storage/primary/CleanupAllVmMetadataOnPrimaryStorageMsg.java
  • header/src/main/java/org/zstack/header/storage/primary/CleanupAllVmMetadataOnPrimaryStorageReply.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataEvent.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataEventDoc_zh_cn.groovy
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataMsg.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataMsgDoc_zh_cn.groovy
  • plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java
  • plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageHypervisorBackend.java
  • plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageKvmBackend.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageBackend.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackend.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackendCommands.java
  • sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataAction.java
  • sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataResult.java
  • storage/src/main/java/org/zstack/storage/primary/PrimaryStorageBase.java
  • testlib/src/main/java/org/zstack/testlib/ApiHelper.groovy
✅ Files skipped from review due to trivial changes (2)
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataEventDoc_zh_cn.groovy
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataMsgDoc_zh_cn.groovy
🚧 Files skipped from review as they are similar to previous changes (4)
  • sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataResult.java
  • plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageHypervisorBackend.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackend.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageBackend.java

Comment on lines +2110 to +2113
if (idx >= connectedHosts.size()) {
reply.setError(operr("failed to cleanup all vm metadata on NFS primary storage[uuid:%s] after trying %d connected host(s)",
self.getUuid(), connectedHosts.size()));
bus.reply(msg, reply);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

最终失败错误缺少根因信息,接口可观测性偏弱。

当前重试全部失败后仅返回通用错误,调用方拿不到最后一次失败的具体原因。建议在最终 reply.setError(...) 中带上最近一次 ErrorCode 细节。

🔧 建议修改
-        cleanupAllOnHostWithFallback(msg, reply, connectedHosts, 0);
+        cleanupAllOnHostWithFallback(msg, reply, connectedHosts, 0, null);
     }
 
     private void cleanupAllOnHostWithFallback(CleanupAllVmMetadataOnPrimaryStorageMsg msg,
                                               CleanupAllVmMetadataOnPrimaryStorageReply reply,
-                                              List<HostInventory> connectedHosts, int idx) {
+                                              List<HostInventory> connectedHosts, int idx,
+                                              ErrorCode lastError) {
         if (idx >= connectedHosts.size()) {
-            reply.setError(operr("failed to cleanup all vm metadata on NFS primary storage[uuid:%s] after trying %d connected host(s)",
-                    self.getUuid(), connectedHosts.size()));
+            reply.setError(lastError == null
+                    ? operr("failed to cleanup all vm metadata on NFS primary storage[uuid:%s] after trying %d connected host(s)",
+                        self.getUuid(), connectedHosts.size())
+                    : operr("failed to cleanup all vm metadata on NFS primary storage[uuid:%s] after trying %d connected host(s), last error: %s",
+                        self.getUuid(), connectedHosts.size(), lastError.getDetails()));
             bus.reply(msg, reply);
             return;
         }
@@
             public void fail(ErrorCode errorCode) {
                 logger.warn(String.format("[MetadataCleanup] cleanAll: NFS ps[uuid:%s] failed on host[uuid:%s]: %s; trying next connected host",
                         self.getUuid(), hostUuid, errorCode));
-                cleanupAllOnHostWithFallback(msg, reply, connectedHosts, idx + 1);
+                cleanupAllOnHostWithFallback(msg, reply, connectedHosts, idx + 1, errorCode);
             }
         });
     }

Also applies to: 2125-2129

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java`
around lines 2110 - 2113, The final failure reply in NfsPrimaryStorage (the
block that checks if idx >= connectedHosts.size() and calls
reply.setError(operr(...))) omits the root cause; capture the last
ErrorCode/Exception from the per-host retry loop (e.g., store it in a variable
like lastError or lastErr when individual attempts fail) and include its details
in the final reply.setError message (augment operr(...) with the
lastError.getDetails()/toString()). Apply the same change to the analogous block
around 2125-2129 so the reply contains the most recent failure information for
observability.

@MatheMatrix MatheMatrix force-pushed the sync/tao.gan/add-APICleanupAllVmInstanceMetadataMsg-ZSV-11867@@3 branch from 8dab03c to 5d220d1 Compare April 29, 2026 12:40
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java (1)

3637-3678: ⚠️ Potential issue | 🔴 Critical

cleanup-all 失败与清理统计丢失,会上游聚合结果失真。

当前实现只遍历 connectedHostUuids,并且未把每主机返回值汇总到 replycleanedCount/failedCount);异常分支也未显式累计失败数。结果是 reply 统计会被低估或保持默认值,premium 侧聚合会不准确。

建议修复(示例)
 `@Override`
 protected void handle(final CleanupAllVmMetadataOnPrimaryStorageMsg msg) {
     CleanupAllVmMetadataOnPrimaryStorageReply reply = new CleanupAllVmMetadataOnPrimaryStorageReply();

-    List<String> connectedHostUuids = getConnectedLocalStorageHostUuids();
-    if (connectedHostUuids.isEmpty()) {
-        logger.warn(String.format("[MetadataCleanup] cleanAll: no connected host found for local ps[uuid:%s]", self.getUuid()));
-        bus.reply(msg, reply);
-        return;
-    }
+    List<String> allHostUuids = getAllLocalStorageHostUuids();
+    List<String> connectedHostUuids = getConnectedLocalStorageHostUuids();
+    List<String> disconnectedHostUuids = new ArrayList<>(allHostUuids);
+    disconnectedHostUuids.removeAll(connectedHostUuids);
+
+    AtomicInteger totalCleaned = new AtomicInteger(0);
+    AtomicInteger totalFailed = new AtomicInteger(disconnectedHostUuids.size());
+
+    if (connectedHostUuids.isEmpty()) {
+        logger.warn(String.format("[MetadataCleanup] cleanAll: no connected host found for local ps[uuid:%s]", self.getUuid()));
+        reply.setCleanedCount(totalCleaned.get());
+        reply.setFailedCount(totalFailed.get());
+        bus.reply(msg, reply);
+        return;
+    }

     new While<>(connectedHostUuids).all((hostUuid, com) -> {
         final LocalStorageHypervisorBackend bkd;
         try {
             LocalStorageHypervisorFactory f = getHypervisorBackendFactoryByHostUuid(hostUuid);
             bkd = f.getHypervisorBackend(self);
         } catch (Exception e) {
             logger.warn(String.format("[MetadataCleanup] cleanAll: failed to prepare backend for host[uuid:%s] on ps[uuid:%s]: %s",
                     hostUuid, self.getUuid(), e.getMessage()));
+            totalFailed.incrementAndGet();
             com.addError(operr("host[uuid:%s] backend prepare failed: %s", hostUuid, e.getMessage()));
             com.done();
             return;
         }
         bkd.handle(msg, hostUuid, new ReturnValueCompletion<CleanupAllVmMetadataOnPrimaryStorageReply>(com) {
             `@Override`
             public void success(CleanupAllVmMetadataOnPrimaryStorageReply returnValue) {
+                totalCleaned.addAndGet(returnValue.getCleanedCount());
+                totalFailed.addAndGet(returnValue.getFailedCount());
                 com.done();
             }

             `@Override`
             public void fail(ErrorCode errorCode) {
                 logger.warn(String.format("[MetadataCleanup] cleanAll: failed on host[uuid:%s] on ps[uuid:%s]: %s",
                         hostUuid, self.getUuid(), errorCode));
+                totalFailed.incrementAndGet();
                 com.addError(operr("host[uuid:%s]: %s", hostUuid, errorCode.getDescription()));
                 com.done();
             }
         });
     }).run(new WhileDoneCompletion(msg) {
         `@Override`
         public void done(ErrorCodeList errorCodeList) {
+            reply.setCleanedCount(totalCleaned.get());
+            reply.setFailedCount(totalFailed.get());
             if (!errorCodeList.getCauses().isEmpty()) {
                 reply.setError(operr("local primary storage[uuid:%s] cleanAll failed on %d/%d host(s): %s",
                         self.getUuid(), errorCodeList.getCauses().size(), connectedHostUuids.size(), errorCodeList));
             }
             bus.reply(msg, reply);
         }
     });
 }
+
+private List<String> getAllLocalStorageHostUuids() {
+    return SQL.New(
+                    "select h.hostUuid from LocalStorageHostRefVO h where h.primaryStorageUuid = :psUuid", String.class)
+            .param("psUuid", self.getUuid())
+            .list();
+}

Also applies to: 3682-3691

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java`
around lines 3637 - 3678, The cleanup currently iterates connectedHostUuids but
never aggregates per-host results into the reply; update the logic around
getConnectedLocalStorageHostUuids / While and bkd.handle so each host outcome
increments reply's cleanedCount or failedCount (and records failures/errors),
including the exception branch where getHypervisorBackend(...) fails; in the
ReturnValueCompletion.success() increment cleanedCount, in fail() and the catch
block increment failedCount and attach per-host error info (or add to a failure
list), and before bus.reply(msg, reply) set reply fields (cleanedCount,
failedCount and any error details) so upstream aggregation sees accurate
statistics (apply same fix to the adjacent block around lines 3682-3691 that
uses similar flow).
plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackend.java (1)

2200-2227: ⚠️ Potential issue | 🟠 Major

成功分支未回填 cleaned/failed 统计,会上层聚合失真。

当前成功时返回空 CleanupAllVmMetadataOnPrimaryStorageReply,会丢失 agent 已产生的清理统计。建议在成功分支写回统计字段。

🔧 建议修复
                 CleanupAllVmMetadataOnPrimaryStorageReply r = new CleanupAllVmMetadataOnPrimaryStorageReply();
+                r.setCleanedCount(rsp.cleanedCount);
+                r.setFailedCount(rsp.failedCount);
                 completion.success(r);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackend.java`
around lines 2200 - 2227, The success branch in
NfsPrimaryStorageKVMBackend.handle does not propagate the cleanup statistics
from CleanupAllVmMetadataRsp into the reply, causing upstream aggregation to be
incorrect; update the success path so that after casting the response
(CleanupAllVmMetadataRsp rsp) you populate a
CleanupAllVmMetadataOnPrimaryStorageReply instance with the rsp's cleaned and
failed counts (or equivalent fields) before calling completion.success(r) so the
caller receives the agent-produced statistics.
plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java (2)

2098-2104: ⚠️ Potential issue | 🟠 Major

无在线 host 时不应把批量清理直接判为失败。

Line 2099 这里直接 setError(...) 会把“尽力而为”的元数据清理升级为硬失败;更稳妥的是记录告警并返回 cleanedCount=0failedCount=0,让上层继续按统计结果聚合。 Based on learnings, VmInstanceResourceMetadataManager.deleteVmResourceMetadata(...) failures are commonly treat failures as best‑effort and do not block the primary operation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java`
around lines 2098 - 2104, Don't treat "no connected host" as hard failure: in
the block that calls factory.getConnectedHostForOperation(getSelfInventory())
(around NfsPrimaryStorage), replace setting a fatal error via
reply.setError(...) and returning with a bus.reply by instead logging or
alerting the condition, set reply.cleanedCount = 0 and reply.failedCount = 0
(and ensure no error is set on reply), call bus.reply(msg, reply), and return;
keep subsequent cleanupAllOnHostWithFallback(...) for when connectedHosts is
non-empty. This preserves best-effort semantics used by
VmInstanceResourceMetadataManager.deleteVmResourceMetadata(...) while still
reporting counts to the caller.

2110-2128: ⚠️ Potential issue | 🟡 Minor

最终失败回复需要带上最后一次根因。

Line 2111 只返回“尝试了多少台 host”,但没有把 Line 2125 的最后一次 ErrorCode 透传给调用方,排障信息不够。建议把最近一次失败沿递归传下去,并在最终 reply.setError(...) 中追加 lastError.getDetails()

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java`
around lines 2110 - 2128, The final failure path currently omits the last host
ErrorCode details; update cleanupAllOnHostWithFallback to accept an additional
ErrorCode lastError parameter, set lastError when
backend.handle(...).fail(ErrorCode errorCode) is called, pass that error into
the recursive call (cleanupAllOnHostWithFallback(msg, reply, connectedHosts, idx
+ 1, errorCode)), and when idx >= connectedHosts.size() build the reply error
using both the existing operr(...) message and lastError (e.g., include
lastError.getDetails() or attach lastError to reply.setError) so the caller
receives the root cause from the last attempt.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageKvmBackend.java`:
- Around line 953-958: The response class CleanupAllVmMetadataRsp should carry
the cleanup metrics so callers can observe results: add integer fields
cleanedCount and failedCount (with getters/setters or public fields as your
style) to CleanupAllVmMetadataRsp, populate those fields when constructing the
reply in the cleanup handler that processes CleanupAllVmMetadataCmd (ensure both
success and partial-failure paths set the counts), and return that populated
CleanupAllVmMetadataRsp instead of an empty reply; apply the same change to the
other identical spot referenced (the duplicate block at 3966-3983) so both reply
types convey cleanedCount/failedCount upstream.

---

Duplicate comments:
In
`@plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java`:
- Around line 3637-3678: The cleanup currently iterates connectedHostUuids but
never aggregates per-host results into the reply; update the logic around
getConnectedLocalStorageHostUuids / While and bkd.handle so each host outcome
increments reply's cleanedCount or failedCount (and records failures/errors),
including the exception branch where getHypervisorBackend(...) fails; in the
ReturnValueCompletion.success() increment cleanedCount, in fail() and the catch
block increment failedCount and attach per-host error info (or add to a failure
list), and before bus.reply(msg, reply) set reply fields (cleanedCount,
failedCount and any error details) so upstream aggregation sees accurate
statistics (apply same fix to the adjacent block around lines 3682-3691 that
uses similar flow).

In
`@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java`:
- Around line 2098-2104: Don't treat "no connected host" as hard failure: in the
block that calls factory.getConnectedHostForOperation(getSelfInventory())
(around NfsPrimaryStorage), replace setting a fatal error via
reply.setError(...) and returning with a bus.reply by instead logging or
alerting the condition, set reply.cleanedCount = 0 and reply.failedCount = 0
(and ensure no error is set on reply), call bus.reply(msg, reply), and return;
keep subsequent cleanupAllOnHostWithFallback(...) for when connectedHosts is
non-empty. This preserves best-effort semantics used by
VmInstanceResourceMetadataManager.deleteVmResourceMetadata(...) while still
reporting counts to the caller.
- Around line 2110-2128: The final failure path currently omits the last host
ErrorCode details; update cleanupAllOnHostWithFallback to accept an additional
ErrorCode lastError parameter, set lastError when
backend.handle(...).fail(ErrorCode errorCode) is called, pass that error into
the recursive call (cleanupAllOnHostWithFallback(msg, reply, connectedHosts, idx
+ 1, errorCode)), and when idx >= connectedHosts.size() build the reply error
using both the existing operr(...) message and lastError (e.g., include
lastError.getDetails() or attach lastError to reply.setError) so the caller
receives the root cause from the last attempt.

In
`@plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackend.java`:
- Around line 2200-2227: The success branch in
NfsPrimaryStorageKVMBackend.handle does not propagate the cleanup statistics
from CleanupAllVmMetadataRsp into the reply, causing upstream aggregation to be
incorrect; update the success path so that after casting the response
(CleanupAllVmMetadataRsp rsp) you populate a
CleanupAllVmMetadataOnPrimaryStorageReply instance with the rsp's cleaned and
failed counts (or equivalent fields) before calling completion.success(r) so the
caller receives the agent-produced statistics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9081a764-c3f8-45eb-8b1c-2a17b2c73532

📥 Commits

Reviewing files that changed from the base of the PR and between 8dab03c and 5d220d1.

⛔ Files ignored due to path filters (1)
  • conf/serviceConfig/vmInstance.xml is excluded by !**/*.xml
📒 Files selected for processing (17)
  • header/src/main/java/org/zstack/header/storage/primary/CleanupAllVmMetadataOnPrimaryStorageMsg.java
  • header/src/main/java/org/zstack/header/storage/primary/CleanupAllVmMetadataOnPrimaryStorageReply.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataEvent.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataEventDoc_zh_cn.groovy
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataMsg.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataMsgDoc_zh_cn.groovy
  • plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageBase.java
  • plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageHypervisorBackend.java
  • plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageKvmBackend.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorage.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageBackend.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackend.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackendCommands.java
  • sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataAction.java
  • sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataResult.java
  • storage/src/main/java/org/zstack/storage/primary/PrimaryStorageBase.java
  • testlib/src/main/java/org/zstack/testlib/ApiHelper.groovy
✅ Files skipped from review due to trivial changes (5)
  • sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataResult.java
  • header/src/main/java/org/zstack/header/storage/primary/CleanupAllVmMetadataOnPrimaryStorageReply.java
  • header/src/main/java/org/zstack/header/storage/primary/CleanupAllVmMetadataOnPrimaryStorageMsg.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataEventDoc_zh_cn.groovy
  • sdk/src/main/java/org/zstack/sdk/CleanupAllVmInstanceMetadataAction.java
🚧 Files skipped from review as they are similar to previous changes (5)
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageBackend.java
  • plugin/nfsPrimaryStorage/src/main/java/org/zstack/storage/primary/nfs/NfsPrimaryStorageKVMBackendCommands.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataMsgDoc_zh_cn.groovy
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataMsg.java
  • header/src/main/java/org/zstack/header/vm/APICleanupAllVmInstanceMetadataEvent.java

Comment on lines +953 to +958
public static class CleanupAllVmMetadataCmd extends AgentCommand {
public String metadataDir;
}

public static class CleanupAllVmMetadataRsp extends AgentResponse {
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

cleanup-all 响应未承载并回传统计字段,结果可观测性不足。

CleanupAllVmMetadataRsp 为空且成功分支返回空 reply,cleanedCount/failedCount 无法透传到上层聚合。

🔧 建议修复
     public static class CleanupAllVmMetadataRsp extends AgentResponse {
+        public int cleanedCount;
+        public int failedCount;
     }
             public void success(CleanupAllVmMetadataRsp rsp) {
                 CleanupAllVmMetadataOnPrimaryStorageReply reply = new CleanupAllVmMetadataOnPrimaryStorageReply();
+                reply.setCleanedCount(rsp.cleanedCount);
+                reply.setFailedCount(rsp.failedCount);
                 completion.success(reply);
             }

Also applies to: 3966-3983

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/localstorage/src/main/java/org/zstack/storage/primary/local/LocalStorageKvmBackend.java`
around lines 953 - 958, The response class CleanupAllVmMetadataRsp should carry
the cleanup metrics so callers can observe results: add integer fields
cleanedCount and failedCount (with getters/setters or public fields as your
style) to CleanupAllVmMetadataRsp, populate those fields when constructing the
reply in the cleanup handler that processes CleanupAllVmMetadataCmd (ensure both
success and partial-failure paths set the counts), and return that populated
CleanupAllVmMetadataRsp instead of an empty reply; apply the same change to the
other identical spot referenced (the duplicate block at 3966-3983) so both reply
types convey cleanedCount/failedCount upstream.

Add APICleanupAllVmInstanceMetadata channel: API/Event/SDK,
internal CleanupAllVmMetadataOnPrimaryStorageMsg, and
LocalStorage/NFS PS handlers with cleanup-all path.
Existing single-VM cleanup channel is untouched.

Resolves: ZSV-11867

Change-Id: I66667361746274616d6879666a6479796f797174
@MatheMatrix MatheMatrix force-pushed the sync/tao.gan/add-APICleanupAllVmInstanceMetadataMsg-ZSV-11867@@3 branch from 5d220d1 to b4c75ab Compare April 29, 2026 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants