Skip to content

feat: network-policy — auto-switch select groups based on network environment#2729

Open
wangwei354 wants to merge 12 commits into
MetaCubeX:Alphafrom
wangwei354:feature/network-policy
Open

feat: network-policy — auto-switch select groups based on network environment#2729
wangwei354 wants to merge 12 commits into
MetaCubeX:Alphafrom
wangwei354:feature/network-policy

Conversation

@wangwei354

@wangwei354 wangwei354 commented Apr 19, 2026

Copy link
Copy Markdown

Update (Apr 22, 2026): This PR has been force-pushed to replace the original single-primary-iface design with a clean-slate multi-interface rewrite. The original design failed to express common desktop scenarios (wired + Wi-Fi with different gateways; Wi-Fi + VPN coexistence; "in the office" when both Ethernet and Wi-Fi are up). Design discussion is at #2722. No reviewer time had been consumed on the old revision (0 formal review comments), so the in-place rewrite is the least disruptive path. If you'd prefer a fresh PR number for a clean review history, just say so and I'll close + reopen.

TL;DR

English: Introduces a top-level networks: YAML block and a per-select-group network-policy: mapping, exposed through PUT / DELETE / GET /network/context. A GUI host pushes a NetworkContext containing the full active-interface set; the kernel evaluates the rule list in order, applies the first-match's policy target on each affected group, and preserves user manual picks within the same network. Zero behavioral change when networks: is absent.

中文:新增 YAML networks: 顶层字段与 select 组的 network-policy: 子字段,通过 PUT / DELETE / GET /network/context 三个 REST 端点暴露。宿主推送包含全部活跃接口集合的 NetworkContext,内核按 networks: 列表顺序 first-match 求值,对每个带 policy 的 select 组应用对应目标;同一 network 下用户手动选择被尊重,network 变化时自动流程接管。不配置 networks: 时行为与旧版字节级等价。

Closes MetaCubeX/mihomo#1330. Design RFC: #2722. Pairs with clash-verge-rev#1231 on the GUI side.

Why the force-push (what changed from the previous revision)

The earlier five-commit revision modeled NetworkContext as a single "primary interface" snapshot with fields like primary_iface / iface_type / ssid / gateway_mac at the top level. On desktop that proves too narrow:

  • Multi-NIC is the common case (wired + Wi-Fi together), and the host has no principled way to pick "the" primary. Whichever one it picks throws away the other's identity.
  • {iface-type: vpn, ...} rules could never match a user-installed WireGuard that runs alongside Wi-Fi, because the host's "physical-iface-fallback" would hide the VPN when the VPN wasn't the primary.
  • Combined conditions like "home Wi-Fi + company VPN" are inexpressible in a single-iface schema.

The v2 design in this PR:

  • Host reports all active interfaces (interfaces[], hard-capped at 32), plus a top-level dns_suffix list.
  • Matchers evaluate with ∃iface ∈ interfaces semantics; multiple per-iface fields in the same block must be satisfied by the same iface (atomic AND). any: / all: / not: combinators are grammar primitives.
  • Per-network precedence is expressed by rule order in networks: (first-match wins). The "primary interface" concept is gone.

The architecture is intentionally conservative: no CGO, no new platform dependencies, no subscription-pushed policy (signed YAML stays the source of truth).

High-level design

The functionality splits naturally into two halves:

Concern Owner Why
Platform-specific network detection (WlanAPI, NWPathMonitor, netlink, NetworkCallback) GUI host Rust / Kotlin ecosystems already have first-class APIs; deep OS event subscriptions; desktop-app territory
Rule semantics, select group state, persistence, REST mihomo kernel (this PR) Same concepts as store-selected; needs to live with proxy groups; YAML rules should travel with the subscription

Rules live in YAML so they ride subscriptions, backups, and are visible to any GUI that implements the detector. Host and kernel communicate through a single versioned REST contract (/network/context). No new platform APIs, CGO, or third-party deps in the kernel.

What's in this PR

Twelve commits, layered so each one compiles, passes go vet, and keeps existing tests green:

  1. feat(network-policy): add package skeleton with multi-interface schema — new component/networkpolicy/ package: NetworkContext (interfaces[] + global dns_suffix + optional ttl), NormalizeAndValidate, stable FNV-64a Fingerprint, the block-level matcher AST with any: / all: / not: combinators and ∃iface semantics, GroupPolicy, shared constants. Pure logic.
  2. feat(network-policy): parse networks and network-policy in config — wires the schema into config.Parse: ParseNetworks builds the list, ParseGroupPolicy validates each select group's Mapping (static proxies fail-fast, provider-expanded targets tolerant-until-first-PUT, default as reserved key). Selector gains NetworkPolicy() / GroupSource() getters. No runtime behaviour yet.
  3. feat(network-policy): persist bucketNetworkPolicy — new bucketNetworkPolicy bucket in component/profile/cachefile stores {source, last_matched_network} per group (proxy name stays in the existing bucketSelected). Decode path tolerates schema drift (unknown schema_version → treated as branch B, cold start).
  4. feat(network-policy): add manager with state machine and TTL — the Manager singleton: per-group state (source + last_matched), global ctx snapshot + TTL timer with generation-stamped stale-callback protection, single-serial-queue evaluation pipeline (evaluate → selector.Set → cachefile → publish). TTL-light-path handles heartbeat renewals under five guard conditions so steady state doesn't hit the full pipeline. Provider barrier on startup prevents races with subscription-driven candidate sets.
  5. feat(network-policy): wire manager into executor lifecyclehub/executor/executor.go owns Install / Uninstall via ApplyConfig (the package exports only these two; a single Install handles both first-time install and hot-reload state migration via the internal inheritFrom). Hot-reload migrates per-group state + ctx + TTL timer + selected proxy from old to new Manager. collectNetworkPolicySelectors uses the SelectorWithPolicy interface (no concrete type assertions).
  6. feat(network-policy): expose /network/context REST endpointshub/route/network.go registers PUT / DELETE / GET under /network/context, with a 10 MiB body cap, strict-end JSON parsing (no trailing junk), and sentinel-error classification that maps validation errors to the {code, message} envelope (see §REST API).
  7. feat(network-policy): hook manual PUT /proxies/:name into state machineupdateProxy calls Manager.HandleManualSet(name) after selector.Set returns; respects lock order globalMu → selector.mu → manager.mu (hook runs outside selector.mu). Select groups without network-policy keep byte-for-byte compatible behavior.
  8. feat(network-policy): warn on exposed external-controller without secret — startup warning when networks: is configured, external-controller is bound to a non-loopback address, and no secret is set (TLS with strong mutual-auth exempts the warning).
  9. refactor(network-policy): narrow exported API surface — package-private where possible; only SelectorWithPolicy / the sentinel error values / NetworkContext / Install / a few lifecycle exports remain.
  10. docs: document network-policy config and REST APIdocs/config.yaml gains the networks: + network-policy: comment-only example; docs/api.md is a new comprehensive REST reference (covers every endpoint in hub/route/, including the /network/context contract in full: the wire schema, the applied[] shape, the seven-reason enumeration, the manual-wins state machine including the missing_target branch, the tri-state wire-null encoding for matched_network / last_matched_network, and the complete {code, message} error code matrix).
  11. feat(tun): expose actual bound device name in /configs response — tiny fd-override fix in listener/sing_tun/server.go: when FileDescriptor > 0, after getTunnelName(fd) succeeds, mirror the resolved name into options.Device so tunLister.Config().Device and therefore GET /configs.tun.device reflect the real bound interface. Host samplers rely on this to filter mihomo's own TUN from the NetworkContext they push.
  12. docs(network-policy): fix dangling reference in config.yaml — post-review follow-up: the comment block added in commit Full regexp support #10 pointed at docs/network-policy.md which was never committed; redirected to docs/api.md §11 where the full field list / combinator semantics / REST contract already live.

Configuration

Minimal — one network, one group:

networks:
  - name: office
    match:
      ssid: corp-5g
      iface-type: wifi

proxy-groups:
  - name: Smart
    type: select
    proxies: [hk, us, DIRECT]
    network-policy:
      office: hk
      default: DIRECT     # matched-nothing or matched-but-no-mapping fallback

Richer grammar (per-iface atomic AND, field-level OR, sub-block combinators):

networks:
  # Ethernet with a specific gateway MAC (office wired)
  - name: office-wired
    match:
      iface-type: ethernet
      gateway-mac: "aa:bb:cc:dd:ee:00"

  # Wi-Fi to any AP in the office set (field-level OR)
  - name: office-wifi
    match:
      ssid: [corp-5g, corp-2.4g, corp-guest]
      iface-type: wifi

  # Home Wi-Fi + company VPN simultaneously (cross-iface AND)
  - name: home-with-corp-vpn
    match:
      all:
        - iface-type: wifi
          ssid: home-network
        - iface-type: vpn
          dns-suffix: corp.example.com   # dns-suffix is global, not bound to this sub-block

  # Any cellular / wwan interface (`not:` for the complement)
  - name: no-cellular
    match:
      not:
        any:
          - iface-type: cellular
          - iface-type: wwan

Full field list, combinator semantics (including the "same iface satisfies all top-level per-iface fields" atomic rule), normalization, and the dns-suffix global-scope caveat are in docs/api.md §11 and docs/config.yaml (the committed example block).

REST API

Method Path Purpose
PUT /network/context Host pushes the current active-interface snapshot; returns matched_network + per-group applied[]
DELETE /network/context Clear the cached context (204); per-group state is preserved (does not trigger default)
GET /network/context Normalized snapshot of the context, current matched_network, per-group current_proxy / selection_source / last_matched_network
curl -X PUT \
  -H "Authorization: Bearer $SECRET" \
  -H "Content-Type: application/json" \
  -d '{
        "version": 1,
        "interfaces": [
          {"name":"wlan0","iface_type":"wifi","ssid":"corp-5g","gateway_ip":"10.0.0.1"}
        ],
        "dns_suffix": ["corp.example.com"],
        "ttl": 1800
      }' \
  http://127.0.0.1:9090/network/context

Error responses on /network/context use a structured envelope (distinct from other mihomo endpoints' {"message":"..."}):

{ "code": "invalid_field", "message": "field: interfaces[0].gateway_ip, reason: parse error" }

Stable code values: malformed_body / invalid_version / invalid_ttl / too_many_interfaces / duplicate_iface_name / invalid_field / invalid_gateway_combo / internal_error. Full schema, normalization rules, and per-code semantics are in docs/api.md §11.

Runtime semantics

Manual-wins is hard-coded. When the user hand-picks a proxy on a policy-bearing group, the group's selection_source flips to manual. Subsequent PUT /network/context behavior:

  • matched_network unchanged + source=manualreason=manual_locked, user's pick preserved.
  • matched_network changed + evaluation lands in matched / already_selected / default / no_change_no_default → auto flow takes over, source resets to auto, last_matched_network advances.
  • matched_network changed but evaluation lands in missing_target (policy-resolved target not currently in the group's candidates, e.g., a subscription-provided node not yet loaded) → skip the switch, source and last_matched_network preserved, retried on the next PUT.

Seven reason values surface on applied[i].reason: matched / already_selected / default / no_change_no_default / unchanged_network (fingerprint-stable skip for the auto path) / manual_locked / missing_target.

DELETE /network/context intentionally preserves per-group state (selection_source, last_matched_network, currently-selected proxy). It only clears the kernel's cached NetworkContext snapshot so the TTL timer can be cancelled. Use cases: host clean shutdown; deliberate "no context, but keep state machine" mode; natural TTL expiry (equivalent). Do not use DELETE to express "host is offline" — instead PUT {version:1, interfaces:[]}, which runs the default / no_change_no_default branch for every group.

State persists across restarts via two buckets: bucketSelected (unchanged semantics — "which proxy was selected") + new bucketNetworkPolicy ({schema_version, source, last_matched_network} — "how that selection was made"). Both gated by profile.store-selected: true, consistent with existing behaviour.

A TTL "light path" skips the full evaluation pipeline when five conditions all hold: fingerprint unchanged, both current and previous bodies carried a TTL, no group has a pending missing_target retry, and the candidate set hasn't drifted since the last full evaluation. Hosts doing periodic keep-alive PUTs pay essentially zero kernel-side cost.

Compatibility

  • Zero behaviour change without networks:. Configs that omit the new keys hit exactly the pre-PR code paths: no Manager is installed, applyOnce never runs, updateProxy's pre-existing code path is unchanged, bucketNetworkPolicy is never written.
  • No new platform dependencies, no CGO, no third-party imports — standard library + golang.org/x/exp/slices / netip (already in use).
  • Cachefile forward/backward compatible: old kernels reading a new DB ignore bucketNetworkPolicy; new kernels reading an old DB (no bucket) start on branch B (cold-start, run initial evaluation after the provider barrier releases).
  • REST-layer forward-compat: the wire schema permits unknown fields (json.Unmarshal default) so new host versions can coexist with older kernels. YAML matchers are the opposite — unknown keys fail-fast, so config bugs surface early.

Security

The startup warning in warnNetworkPolicyExternalController fires when all three conditions hold simultaneously, per endpoint:

  1. At least one proxy group declares network-policy: in the config.
  2. external-controller / external-controller-tls binds a non-loopback address.
  3. No REST secret is set (TLS endpoint with client-auth-type: require-and-verify + a client-auth cert is treated as equivalent to a secret for this check).

Non-blocking — it's a nudge for the operator, never an abort. Unix sockets and Windows named pipes are intentionally skipped (filesystem / pipe ACLs already gate access).

The underlying rationale: on a policy-bearing select group, PUT /network/context indirectly picks proxies for the user. An unauthenticated LAN attacker could otherwise push a crafted context to force traffic through a specific node.

Testing

  • go test ./component/networkpolicy/... ./config/... ./adapter/outboundgroup/... ./hub/route/... ./hub/executor/... ./component/profile/cachefile/... ./listener/... all green.
  • ~4,000 LoC of tests across the new _test.go files. Coverage includes: normalisation idempotency, FNV fingerprint stability, matcher atomic-AND semantics and combinators against the full architecture test matrix, state machine branches (including cold-start branch A vs B, provider barrier, startup eval pending flag), TTL timer stale-callback protection, persistence downgrade scenarios, manual-wins golden paths, concurrent HandleManualSet + PutContext, REST handler error-code mapping end-to-end (including the ttl > 10 years upper-bound case), and the hot-reload InstallinheritFrom state-migration path.
  • Manual smoke: start with networks: + network-policy:; curl -X PUT /network/context with a multi-interface body and observe applied[*].reason; re-PUT with same body → unchanged_network; PUT /proxies/Smart -d '{"name":"us"}' then re-PUT same context → manual_locked; change the interfaces[] to a different network → auto takeover; DELETE → state preserved, groups keep their current proxies; next PUT resumes state machine with retained source.

Out of scope / Follow-ups

  • Host-side detection is not in this PR — mihomo alone only switches when an external actor (clash-verge-rev, FlClash, a cron script, systemd-networkd dispatcher) PUTs a context. The kernel contract is stable; GUI implementations (clash-verge-rev#1231) can land whenever.
  • No automatic Linux netlink collector in-tree. The architecture allows adding one later as an opt-in component for headless deployments.
  • Matcher metered: is currently fail-fast (reserved field on the wire but disabled in the matcher). Three-platform samplers don't collect it yet; opening the matcher now would let users write not: {metered: true} rules that silently always match.
  • macOS utun real bound name is still best-effort (pre-bind estimate from CalculateInterfaceName()). Tightening that to a true post-bind read-back is deferred.

中文版本

更新(2026-04-22):本 PR 已经 force-push,用 clean-slate 重新设计取代了原先的"单 primary 接口"版本。原设计在桌面场景下表达力不足(有线+Wi-Fi 同时在线、Wi-Fi+VPN 共存、"在公司"同时包含有线和 Wi-Fi 等),RFC 讨论见 #2722。旧版本没有消耗 reviewer 时间(0 条 formal review comment),in-place 重写是最小扰动的方案。如维护者偏好新开 PR 号获得干净的 review 历史,告知即可,我关闭这个并重新开一个。

概要

新增 YAML networks: 顶层字段与 select 组的 network-policy: 子字段,通过 PUT / DELETE / GET /network/context 三个 REST 端点暴露。宿主检测到网络变化后把包含全部活跃接口集合的 NetworkContext 推给内核;内核按 networks: 列表顺序 first-match 求值,对每个带 policy 的 select 组应用对应目标代理;同一 network 下用户手动选择被尊重,network 切换时自动流程接管。未配置 networks: 时行为与旧版字节级等价。

Closes MetaCubeX/mihomo#1330。设计讨论:#2722。与 clash-verge-rev#1231 的 GUI 侧工作配套。

与之前版本的区别(为什么要 force-push)

之前的 5 commit 版本把 NetworkContext 建模为一张 "primary interface" 快照,顶层直接挂 primary_iface / iface_type / ssid / gateway_mac 等字段。桌面端这种抽象太窄:

  • 多网卡共存是常态(有线 + Wi-Fi 同时在线),宿主没有放之四海皆准的 "主接口" 判定法则;选了其中一个就丢失另一个的身份信息。
  • {iface-type: vpn, ...} 规则永远匹不到用户自装的 WireGuard(当 VPN 不是 primary 时会被宿主的 "物理接口优先 fallback" 隐藏掉)。
  • "家里 Wi-Fi + 公司 VPN" 这类组合条件在单接口 schema 下完全表达不出来。

本 PR 的 v2 设计:

  • 宿主上报所有活跃接口interfaces[],硬上限 32),外加顶层 dns_suffix 列表。
  • Matcher 按 ∃iface ∈ interfaces 求值;同一 block 里多个 per-iface 字段必须由同一张 iface 同时满足(原子 AND)。any: / all: / not: 是语法一级组合子。
  • network 间优先级由 networks: 列表顺序表达(first-match),彻底放弃 "primary iface" 概念。

设计整体克制:无 CGO、无新平台依赖、不走订阅推送(保留 YAML 作为 source of truth)。

本 PR 包含什么

12 个 commit,每个都能单独 go build / go vet / go test 通过:

  1. feat(network-policy): add package skeleton with multi-interface schemacomponent/networkpolicy/ 新包,含 NetworkContext / 归一化 / FNV-64a fingerprint / 块级 matcher AST(any: / all: / not: + ∃iface 语义)。
  2. feat(network-policy): parse networks and network-policy in config — 接入 config.ParseParseNetworks + ParseGroupPolicy(静态代理 fail-fast、provider 节点容忍到首次 PUT、default 为保留 key)。
  3. feat(network-policy): persist bucketNetworkPolicy — cachefile 新 bucket 存 {source, last_matched_network};解码路径容忍 schema 漂移。
  4. feat(network-policy): add manager with state machine and TTLManager 单例:每组状态机 + 全局 ctx 快照 + generation 防 stale 的 TTL timer + 五条件 TTL 轻量路径 + provider 首载屏障。
  5. feat(network-policy): wire manager into executor lifecycleexecutor.goInstall / Uninstall(包里只导出这两个;Install 内部通过私有的 inheritFrom 完成热重载状态迁移),热重载继承每组状态 / ctx / TTL / 已选代理。
  6. feat(network-policy): expose /network/context REST endpointshub/route/network.go 注册三个 endpoint,带 10 MiB body cap、严格 JSON 结尾检查、{code, message} 错误码分类。
  7. feat(network-policy): hook manual PUT /proxies/:name into state machineupdateProxyselector.Set 成功后调 Manager.HandleManualSet,不带 network-policy 的组行为不变。
  8. feat(network-policy): warn on exposed external-controller without secret — 启动期三条件全部成立时输出 warning(有 policy + non-loopback + 无 secret)。
  9. refactor(network-policy): narrow exported API surface — 尽可能包内私有,仅保留必要导出面。
  10. docs: document network-policy config and REST APIdocs/config.yaml 加注释例子块;新增 docs/api.md 完整 REST 参考(覆盖所有 hub/route/ endpoint 以及 /network/context 契约的全部细节)。
  11. feat(tun): expose actual bound device name in /configs responsesing_tun.New 的 fd-override 分支补 options.Device = tunName 写回,让 GET /configs.tun.device 返回 listener 实际 bound 名,供 host sampler 过滤 mihomo 自己的 TUN。
  12. docs(network-policy): fix dangling reference in config.yaml — review 后修补:commit Full regexp support #10 的注释块指向了未提交的 docs/network-policy.md,改为指向实际存在的 docs/api.md §11(完整字段 / 组合子语义 / REST 契约都在那里)。

运行时语义要点

Manual-wins 状态机:用户手动切换后该组 source=manual;下次 PUT 时:

  • matched 未变 + source=manualmanual_locked,保留用户选择。
  • matched 变了且落入 matched / already_selected / default / no_change_no_default → auto 接管,source 重置为 autolast_matched 前进。
  • matched 变了但落入 missing_target(policy 求得的目标代理当前不在本组候选)→ 跳过切换,state 保留,等下次 PUT 重试。

DELETE 仅清 ctx 快照,保留状态机(不触发评估、不走 default)。要表达"宿主离线"请发 PUT {version:1, interfaces:[]} 让各组走 default 兜底。

持久化bucketSelected(既有,proxy 名)+ 新增 bucketNetworkPolicy{schema_version, source, last_matched});均受 profile.store-selected: true 门控。

TTL 轻量路径:五条件全部成立时 PUT 跳过完整评估管线(fingerprint 未变 / 本次与上次都带 ttl / 无 pending missing_target / 候选集未漂移)。

兼容性

未配置 networks: 时行为零变化;不引入 CGO 或新依赖;cachefile 前后双向兼容;wire schema 对未知字段宽容,matcher YAML 对未知 key fail-fast。

测试

go test ./component/networkpolicy/... ./config/... ./adapter/outboundgroup/... ./hub/route/... ./hub/executor/... ./component/profile/cachefile/... ./listener/... 全绿。约 4000 LoC 测试覆盖归一化、fingerprint、matcher 原子 AND、状态机分支(含分支 A/B、provider barrier、startupEvalPending)、TTL 防 stale callback、持久化降级、manual-wins 黄金路径、HandleManualSet + PutContext 并发、REST 错误码端到端、热重载继承。

范围之外

  • 宿主侧检测不在本 PR 内。内核契约已稳定,GUI 实现可独立推进。
  • 内核不含自动 Linux netlink 采集器
  • metered: matcher 当前 fail-fast(wire 字段保留,三平台 sampler 尚未采集)。
  • macOS utun 实际 bound 名仍是 best-effort(pre-bind 估计值),真正的 post-bind 读回延后处理。

@wangwei354

Copy link
Copy Markdown
Author

内核和GUI的联合设计RFC在 #2722

@wangwei354

Copy link
Copy Markdown
Author

Update: clean-slate rewrite (force-pushed)

TL;DR (EN): This PR's branch has been force-pushed (1182a2daa815cee3) to a ground-up redesign. The original 5-commit revision used a single-primary-interface NetworkContext; it turned out too narrow for common desktop multi-NIC scenarios. No reviewer time had been invested (0 formal review comments), so the in-place rewrite is the least disruptive path. If you'd prefer a new PR number for a clean review history, I'm happy to close + reopen — just say so.

概要(中文):本 PR 的分支已 force-push 到重新设计版本(1182a2daa815cee3)。原先 5 commit 版本用单一 primary interface 建模 NetworkContext,实测在桌面常见多网卡场景下表达力不足。原版本没有消耗 reviewer 时间(0 条 formal review comment),in-place 重写是最小扰动方案;如果维护者偏好新 PR 号换一个干净历史,告知即可,我关闭这个并重新开一个。

What changed (high level)

v1 (original) v2 (this rewrite)
Wire schema single primary iface: primary_iface / iface_type / ssid / gateway_mac / dns_suffix at top level interfaces[] (hard-capped 32) + top-level dns_suffix[]
Matcher semantics field match against the primary iface ∃iface ∈ interfaces with per-iface atomic AND; any: / all: / not: combinators as grammar primitives
Precedence implicit (primary wins) networks: list order — first match wins; "primary" concept dropped
VPN coexistence hidden when not primary visible and matchable as iface-type: vpn; cross-iface AND via all:
Error envelope {"message":"..."} {"code":"...","message":"..."} with stable code values for structured host handling
DELETE semantics triggers default evaluation clears ctx snapshot only, per-group state preserved

What didn't change

  • The kernel vs host split. Detection is still the host's job; kernel owns rule semantics, select group state, persistence, REST.
  • Manual-wins as a hard-coded invariant (same-matched_network manual picks are respected; switching matched_network hands control back to the auto flow).
  • Zero behavior change when networks: is absent.
  • No new platform deps, no CGO.

Commit history

Now 11 commits, each independently go build / go vet / go test green:

a815cee3  feat(tun): expose actual bound device name in /configs response
9d62bb4a  docs: document network-policy config and REST API
e95f0e97  refactor(network-policy): narrow exported API surface
2528f5cf  feat(network-policy): warn on exposed external-controller without secret
6c793fb2  feat(network-policy): hook manual PUT /proxies/:name into state machine
798a4123  feat(network-policy): expose /network/context REST endpoints
f5a82d81  feat(network-policy): wire manager into executor lifecycle
c1a40851  feat(network-policy): add manager with state machine and TTL
747d1ca6  feat(network-policy): persist bucketNetworkPolicy
17635000  feat(network-policy): parse networks and network-policy in config
6f7e8588  feat(network-policy): add package skeleton with multi-interface schema

Design discussion updated

The design RFC at #2722 has been updated in lockstep so it reflects the v2 schema, matcher grammar, state machine, and REST contract as shipped in this PR.

The PR description at the top of this page is the single source of truth for review; please refer to that rather than older inline discussion above.

@wangwei354

Copy link
Copy Markdown
Author

从mihomo到Clash Verge Rev 全链路的包:
https://github.com/wangwei354/clash-verge-rev/releases

@wwqgtxx wwqgtxx force-pushed the Alpha branch 5 times, most recently from 61b8d7f to 17bed79 Compare May 15, 2026 10:13
wangwei354 added 12 commits May 20, 2026 17:28
Introduce the networkpolicy package for host-pushed network context +
rule-list evaluation. Schema uses a multi-interface inventory shape:
NetworkContext.Interfaces is a required array of InterfaceContext
entries; dns_suffix is a []string at the top level; ttl is an optional
*int. No single-primary field.

Files:
- context.go: NetworkContext + InterfaceContext types. Custom
  UnmarshalJSON enforces wire-required version/interfaces presence
  (malformed_body at the REST layer). normalize() + validate()
  canonicalize MAC forms, strip IPv6 zones, sort+dedupe+filter-empty
  subnets, sort interfaces by name, and check name uniqueness,
  MaxInterfaces=32 cap, TTL bounds, gateway_mac-requires-gateway_ip,
  dns_suffix character rules (commas rejected to prevent within-field
  aliasing in fingerprint's join-by-comma). Fingerprint() emits fnv64a
  of length-prefixed fields keyed by "iface.<idx>.<field>" in sorted
  order; version field is the literal string "1" so Fingerprint stays
  byte-stable independent of c.Version state.
- matcher.go: matchBlock as the AST node, compiled as the tuple
  (ifacePred, globalPred, combinators). Evaluation:
  block ≡ (P_empty ? true : ∃iface . P(iface)) ∧ G ∧ C_1 ∧ … ∧ C_n.
  Atomicity (per-iface field AND is on the same iface) is enforced
  because ifacePred operates on InterfaceContext. `any:` / `all:` /
  `not:` combinators all enabled. Global `dns-suffix` matcher
  evaluates set intersection vs ctx.DNSSuffix. `metered` matcher is
  rejected at parse-time (wire field still accepted, but no platform
  sampler populates it yet so not:{metered:true} would silently
  always-match). All Matcher implementations return false for nil
  ctx rather than panic.
- network.go: first-match Match(networks, ctx) entry.
- policy.go: GroupPolicy + reason/source enum constants + private
  selectable / selectorWithPolicy interfaces (consumed by Manager
  commit).
- util.go: normalizeMAC / meteredString + new normalizeDNSSuffix.

29 unit tests cover: interfaces normalization (sort-by-name, MAC
forms, IPv6 zone strip, subnets dedup-after-masked), validation
(duplicate names, >MaxInterfaces cap, gateway_mac requires gateway_ip,
version=0 rejected, comma rejected in dns_suffix), atomic ∃-wrapping
vs split-across-ifaces, combinator type identity at len>=2 (anyMatcher
/allMatcher), global+per-iface mix, not:/any:/all: combinators
including empty-interfaces behavior, null-field exclusion from ∃
domain, gateway-mac hit/miss/null/list/normalization, subnets
wildcard /0 behavior, JSON tri-state decoding, rejection of missing
required fields, rejection of scalar dns_suffix, and Matcher.Match
nil-safety.

Clean-slate: no v1/v2 compat layer; replaces the previously-open
feature/network-policy branch (archived as feature/network-policy_v1
+ tag feature-network-policy-v1-archive-20260421).
Wire the YAML layer for the network-policy feature. Host pushes a
NetworkContext (M1 skeleton) and kernel needs to evaluate it against
user-authored rules; this commit adds the parse + validate pass.

Top-level `networks:` list:
- component/networkpolicy/config_parse.go::ParseNetworks walks the
  raw []map[string]any, rejects unknown top-level keys (only `name`
  and `match` accepted), reserved names (default / <none>), empty /
  non-string / duplicate names, and delegates match-block parsing to
  the existing ParseMatch. Parse errors from ParseMatch (including
  the metered matcher rejection) propagate with contextual wrapping.

Per-select-group `network-policy:` subfield:
- ParseGroupPolicy validates every (network → target) entry. Validation
  ordering is intentional: key validity is checked before target shape
  so a reserved key (<none>) or unknown network is reported rather than
  a downstream non-string-target complaint.
    * keys must be in the top-level networks[] names or equal DefaultKey;
      <none> rejected as reserved sentinel
    * targets must be non-empty strings and pass the architecture §5.8.1
      reachability split:
        1. target ∈ StaticProxies (post Filter / ExcludeFilter /
           ExcludeType) → parse-time visible, OK
        2. target ∈ globalProxyNames but ∉ StaticProxies → fail-fast
           with an error that suggests both reachability fixes and
           rename of a colliding provider-emitted node
        3. target ∉ globalProxyNames && HasProvider → tolerant
           (may come from a subscription; runtime reports missing_target
           on first PUT if still absent)
        4. target ∉ globalProxyNames && !HasProvider → fail-fast
           (target doesn't exist anywhere)
    * default's target goes through the same reachability check
- Empty `network-policy: {}` is rejected (users who want "no policy"
  should omit the field; an explicit empty map is a typo most of the
  time).

Wiring:
- config/config.go: RawConfig.Networks []map[string]any plus
  Config.Networks []networkpolicy.Network; ParseRawConfig parses
  networks first, passes the network-names list into parseProxies.
  parseProxies pre-computes a stable global proxy-name set (built-ins
  + top-level proxies: + every declared proxy-group name + auto-
  injected GLOBAL) BEFORE the group-parsing loop and threads it into
  every ParseProxyGroup call, so the §5.8.1 "globally known" check is
  iteration-order independent (proxyGroupsDagSort topologically sorts
  groups but leaves independent groups in arbitrary order). After
  group parsing, warnOrphanNetworks emits a log warning for any
  network defined under `networks:` but referenced by no select
  group's network-policy mapping (architecture §5.8.1 orphan
  diagnostic; non-blocking).
- adapter/outboundgroup/parser.go: accepts AllNetworks []string and
  AllProxyNames []string; for `select` groups, builds a GroupSource
  from the Exclude{Filter,Type}-filtered static proxy set plus
  external-provider presence, invokes ParseGroupPolicy with the
  caller-supplied AllProxyNames, and records both on the new Selector.
  The new filterStaticProxies helper mirrors GroupBase.GetProxies's
  runtime filtering so the "visible at parse time ⇒ reachable at
  runtime" invariant promised by M1's GroupSource docs actually holds
  (without it, exclude-filter / exclude-type could silently demote a
  parse-time error into a runtime missing_target). filterStaticProxies
  duplicates NewGroupBase's regex/split logic — a code comment flags
  the hand-sync requirement. For non-select group types, presence of
  `network-policy:` is a parse-time error (§5.8.1), including the
  degenerate null-value case. External-provider detection uses
  len(groupOption.Use) > 0; the internal CompatibleProvider wrapping
  proxies: does not count.
- adapter/outboundgroup/selector.go: Selector gains `policy` and
  `groupSource` fields plus HasProxy / NetworkPolicy / GroupSource /
  SetNetworkPolicy methods — collectively satisfies the M1
  selectorWithPolicy interface. SetNetworkPolicy is a post-construction
  setter so the GLOBAL default selector and tests keep working without
  signature churn.

Tests:
- component/networkpolicy/config_parse_test.go: unit tests covering
  empty input, happy-path with multi-entry networks, reserved-name
  rejection (both default and <none>), duplicate / missing / empty /
  non-string names, missing / non-map match block, unknown top-level
  keys, empty entry, metered propagation, unknown matcher key
  propagation; ParseGroupPolicy happy (asserts Mapping size to guard
  against default-leakage) + provider-tolerant + global-but-unreachable-
  fails-even-with-provider + unknown-target-no-provider + unknown-
  network + default-only + matched-none-reserved + empty-map + non-map
  + non-string-target + empty-target + default-target reachability +
  reserved-key-takes-precedence-over-target-shape + unknown-network-
  takes-precedence-over-target-shape.
- adapter/outboundgroup/parser_test.go: filterStaticProxies unit
  coverage (no-filter defensive copy, empty input, single + backtick-
  multi exclude-filter, exclude-type case-insensitive + pipe-multi,
  unresolved proxy name passthrough) + ParseProxyGroup coverage for
  non-select rejection, exclude-filter-demotes-built-in-target
  rejected at parse time as "unreachable" (DIRECT lives in the global
  set), exclude-filter-not-hitting-target happy, references-later-
  group regression (targets a group name not yet in proxyMap must
  still trip the global-but-unreachable branch), and select-without-
  network-policy happy.

Build and existing tests remain clean. The M1 selector_with_policy
interface is now satisfiable by *outboundgroup.Selector; M3 Manager
will consume these without signature changes.
Groundwork for M3b's manager state machine: wire up the cachefile
bucket and serialization layer for per-group network-policy state, so
the manager only needs to read/write bytes through a stable interface.

component/networkpolicy/persist.go:
- PersistedState carries per-group {schema_version, source,
  last_matched_network} with a tri-state last_matched encoding:
    * nil (never evaluated) → JSON null. Legitimately reached two
      ways: branch B initial state before any evaluation, and
      source=manual + nil when the user flipped a group manually
      before any ctx PUT arrived.
    * MatchedNone sentinel (evaluated, no match) → logical 6-char
      `<none>`. encoding/json escapes `<` and `>` using the Unicode
      form (not HTML entities), so the raw on-disk byte sequence
      between the JSON string quotes is the 16-char `<none>`;
      Unmarshal restores the logical form. bbolt inspection tools
      display the escaped sequence.
    * concrete name → JSON "<name>".
  Custom MarshalJSON/UnmarshalJSON route the presence bit through a
  single JSON key (last_matched_network) rather than splitting into two
  keys — simpler for future cache consumers.
- schema_version is intentionally distinct from the PUT body `version`
  field (architecture §5.6.1): bucket-level format version vs.
  context-schema version, one can evolve without the other.
- Validate() enforces the state-machine invariants that §5.6.1 /
  §5.6.2 / §5.6.3 implicitly promise. Structural consistency runs
  first, then source-specific rules, so a programmer mistake
  (LastMatchedPresent=false + non-empty LastMatched) gets a targeted
  error rather than a downstream "source=auto requires present"
  diagnostic that hides the real bug. Rejected states:
    * schema_version != current
    * LastMatchedPresent=true + empty LastMatched
    * LastMatchedPresent=true + LastMatched == DefaultKey
    * LastMatchedPresent=false + non-empty LastMatched (inconsistent)
    * source=auto + LastMatchedPresent=false (§5.6.2 auto-setting
      transitions always advance last_matched)
    * source=unknown (§5.6.3 never writes the initial state)
    * source outside {auto, manual, unknown}
- MarshalValidated() runs Validate before json.Marshal — the typed
  marshaling helper for programmer-facing code.
- WriteNetworkPolicyState() is the typed write-path entry: chains
  Validate → json.Marshal → cachefile.SetNetworkPolicyState so M3b's
  manager never has to hand-roll the sequence. cachefile.Set is still
  raw-bytes (schema ownership stays here), so the safety guarantee
  comes from everyone routing writes through this helper.
- REST-layer wire encoding (null for both nil and MatchedNone, host
  disambiguates via selection_source) is a separate concern owned by
  the manager; the persistence layer keeps the two absence
  representations distinct so state-machine history survives restart.

component/profile/cachefile/cache.go:
- New bucketNetworkPolicy ([]byte("networkpolicy")) alongside
  bucketSelected / bucketFakeip / etc.
- SetNetworkPolicyState(group, value): writes raw JSON bytes; gated on
  profile.StoreSelected so network-policy persistence follows the same
  opt-in as selected-proxy persistence (toggling StoreSelected off
  makes subsequent loads skip the bucket). Deliberately accepts raw
  bytes so the networkpolicy package retains schema ownership;
  callers should go through networkpolicy.WriteNetworkPolicyState.
- DeleteNetworkPolicyState(group): removes an entry, used for orphan
  GC (hot reload dropping a group's network-policy) and for the
  manager to invalidate a corrupted record on load. Deliberately NOT
  gated on StoreSelected — an unconditional Delete is required so
  stale records cannot resurrect when the user toggles StoreSelected
  back on.
- NetworkPolicyStateMap() returns (map, bucketExists bool): the flag
  distinguishes "bucket doesn't exist" (branch B) from "bucket exists
  but is empty" (branch A with no prior per-group state), which
  architecture §5.6.2 treats as different starting conditions. An
  empty but existing bucket still means branch A; groups missing from
  the map simply start fresh within branch A semantics.
- Value bytes are treated opaquely here; the networkpolicy package
  owns the schema, keeping the two packages decoupled.

Tests (component/networkpolicy/persist_test.go):
- Concrete-name / MatchedNone / nil-sentinel JSON round-trips. The
  MatchedNone test asserts the 16-char `<none>` escape
  sequence actually lands on disk, so the commit's byte-level doc
  claim is regression-guarded. The nil-sentinel test is a format-
  layer regression (source=unknown + null is the in-memory initial
  state Validate correctly rejects on load); the separate Validate
  tests enforce the state-machine invariant.
- Validate happy-paths enumerated explicitly: auto + name, auto +
  MatchedNone, manual + name, manual + MatchedNone, manual + nil.
- Validate negative cases: unsupported schema_version, invalid
  source, source=unknown on load, source=auto with nil last_matched,
  empty LastMatched with Present=true, reserved DefaultKey as
  LastMatched, inconsistent Present=false with non-empty LastMatched
  (covered on both auto and manual branches, since the structural
  check is source-agnostic), and an ordering regression that asserts
  the structural error wins over the source-specific one.
- MarshalValidated rejects invalid structs (documenting the contract
  vs raw json.Marshal) and passes valid ones through cleanly.
- Missing-field robustness (empty {} decodes to zero-value, which
  Validate will reject so callers fall through to branch B).
- Malformed-JSON error propagation.

No manager code yet — that lands in M3b. No executor wiring — that
lands in M3c. This commit is strictly the persistence contract so the
subsequent manager commit can stay focused on state-machine logic.
Core state-machine kernel for the network-policy feature. Consumes the
M3a persistence layer and the M2 GroupPolicy schema; exports the APIs
M3c executor wiring will call.

component/networkpolicy/manager.go:
- Manager: single serial-queue kernel owning per-group state, cached
  NetworkContext snapshot, TTL timer, provider barrier flags, and the
  TTL light-path decision vector.
- NewManager restores per-group state from the bucket with the branch
  A / B split (architecture section 5.6.2):
    * branch A (bucket exists): each group's saved state populates
      source and last_matched_network; corrupt entries are dropped
      and GC'd via DeleteNetworkPolicyState; groups with no entry
      start unknown/nil inside branch A.
    * branch B (no bucket): every group starts unknown/nil with
      startup_eval_pending=true, so ReleaseBarrier runs an internal
      matched=none (or cached-ctx) evaluation.
- PutContext deep-copies the caller's NetworkContext and runs
  NormalizeAndValidate (defensive-copy rule). Computes fingerprint,
  tries TTL light path first; on miss, enters serial queue and runs
  the full state machine with all seven reasons (matched /
  already_selected / default / no_change_no_default / unchanged /
  manual_locked / missing_target).
- DeleteContext clears cached ctx + TTL timer but preserves
  source / last_matched / selected proxy.
- HandleManualSet records source=manual; last_matched unchanged;
  startup_eval_pending is NOT cleared so post-barrier recheck can
  still reassert auto takeover on network change.
- ReleaseBarrier: per-group provider-barrier release. Groups still
  pending re-evaluate with cached ctx or matched=none. After eval
  it re-publishes the global atomicHasPendingMissingTarget so the
  TTL light path condition (d) re-enables when a barrier-period
  missing_target has resolved.
- ForceReEvaluate: hot-reload path. Runs against cached ctx bypassing
  unchanged/manual_locked short-circuits so YAML policy edits take
  effect. Consumes candidate_set_dirty via snapshot-CAS.
- OnCandidateSetDirty: uses an atomic counter (not a boolean) so
  concurrent invocations during an in-flight full evaluation are
  preserved via CompareAndSwap on the pre-eval snapshot.
- GetStatus: matched_network computed from cached ctx (not per-group
  last_matched, which stays nil under missing_target); AgeSeconds
  populated from ctxReceivedAt stamp.

Concurrency guards:
- Mutable path fully serialized under mu; TTL light path reads five
  atomic fields without mu and re-checks under mu before committing.
- TTL timer stale-fire guard: ttlGen monotonic stamp incremented on
  start/stop; onTTLExpired bails when gen mismatches. Without this,
  a TTL renewal racing with the old timer firing would let the old
  callback wipe the newly-stored ctx.

Other details:
- Defensive deep-copy covers Interfaces / Subnets / Metered pointer /
  DNSSuffix / TTL; derived fields zeroed and repopulated.
- Cachefile writes gated on actual state change; all writes flow
  through WriteNetworkPolicyState so Validate invariants are
  enforced pre-disk.
- missing_target per-group bit ORs into the global atomic via
  recomputePendingMissingTargetLocked called from every evaluator.

manager_test.go (24 tests) covers: branch-B init, all seven reasons,
manual-wins three-phase (preserve / lock / takeover), manual preserves
startup_eval_pending, DELETE preserves state, TTL light path all five
conditions, TTL stale-fire regression, ReleaseBarrier three cases
plus atomic-clear regression, ForceReEvaluate three cases including
candidate_set_dirty consumption, GetStatus ctx-level matched + age_seconds,
deep-copy isolation.

M3c (next commit) wires ApplyConfig to construct the Manager, plumb
provider-ready signals into ReleaseBarrier, and call ForceReEvaluate
on hot reload. REST handlers come later in M4.
Plugs the M3b Manager into hub/executor.ApplyConfig so the
network-policy state machine becomes live at cold start and survives
hot reloads correctly. REST endpoints (PUT/DELETE/GET /network/context
and the manual-PUT hook) land in M4 on top of this wiring.

component/networkpolicy/install.go:
- Global() / Install() / Uninstall(): process-wide Manager accessor
  so REST handlers and provider-update hooks can find the active
  Manager without threading a pointer through every call site.
  Install is the only public construction entry point; it swaps the
  global pointer under a mutex, migrates in-memory state on hot
  reload, and releases per-group barriers at the end.
- inheritFrom: §5.8.3 "in-memory state always preserved across hot
  reload" protocol.
    * per-group source, last_matched_network (and presence),
      startup_eval_pending, missingTargetPending carry across for
      groups whose name appears in both the old and new Manager.
      Overrides whatever NewManager loaded from the cachefile, since
      in-memory state is authoritative per §5.6.1.
    * groups new to the reload keep their NewManager-initialized
      state (branch A or B). Groups absent from the new Manager are
      silently discarded.
    * cached ctx is re-adopted including expires_at, ttl pointer,
      fingerprint, and ctxReceivedAt (so AgeSeconds keeps reporting
      age-since-last-host-PUT, not age-since-reload). The TTL timer
      is re-started on the new Manager.
    * candidate_set_dirty counter is incremented unconditionally on
      reload to reflect §5.6.3's "组成员列表变化" rule — the next
      TTL heartbeat falls through to a full evaluation.
    * global atomicHasPendingMissingTarget is recomputed from the
      migrated per-group bits.
- Uninstall: shutdown / test-cleanup hook. Stops any pending TTL
  timer on the outgoing Manager so AfterFunc goroutines don't leak
  across tests.

component/networkpolicy/install_test.go (8 tests): first-time
Install publishes Global; hot reload inherits per-group state
(source=manual + last_matched=office); hot reload inherits cached
ctx and ForceReEvaluate matches on it; new group starts fresh and
its branch-B barrier release advances source past unknown; dropped
group is discarded without panic; hot reload marks candidate_set_dirty;
Install releases the barrier and applies default on branch-B cold
start; Uninstall clears global and stops TTL timers.

component/networkpolicy/policy.go: `selectorWithPolicy` →
`SelectorWithPolicy` (also `selectable` → `Selectable`). Export was
necessary so hub/executor can assemble the []SelectorWithPolicy list
to hand to Install without the package importing outboundgroup
(which would be a cycle). Manager / tests rename follows.

hub/executor/executor.go:
- ApplyConfig calls updateNetworkPolicy(cfg) after loadProvider
  (providers are synchronously populated) and updateProfile
  (patchSelectGroup has restored each Selector's `selected` from
  bucketSelected). This ordering ensures the Manager inspects a
  fully-prepared state when it runs Install's ReleaseBarrier sweep.
- updateNetworkPolicy walks cfg.Proxies, unwraps each C.Proxy via
  its Adapter() accessor, collects outboundgroup.Selector instances
  with a non-empty NetworkPolicy(), sorts them by name for
  deterministic REST output, and calls networkpolicy.Install. Then
  ForceReEvaluate runs unconditionally — a no-op on first cold start
  (no cached ctx) and the §5.8.3 "cached-ctx re-evaluation" on hot
  reload.

No new cycles in the import graph: executor already imports
outboundgroup + cachefile; the new import is component/networkpolicy,
which itself only depends on cachefile (for the bucket) + stdlib.

Build and existing test suites (config, outboundgroup, networkpolicy,
cachefile) remain clean. M4 (REST endpoints) will consume Global()
to dispatch PUT/DELETE/GET /network/context and hook the manual
PUT /proxies/:name path into HandleManualSet.
PUT / DELETE / GET `/network/context` implement the host-to-kernel
control plane of the network-policy feature: host (clash-verge-rev
netmon or any compatible HTTP client) samples the active interfaces
and pushes a NetworkContext; kernel runs the state machine and
returns the applied[] per-group decision.

hub/route/network.go:
- networkContextRouter mounts the three handlers onto /network/context.
- putNetworkContext decodes the body via NetworkContext.UnmarshalJSON
  (which already enforces the two wire-required keys), enforces a
  10 MiB body cap as defense-in-depth and a strict-JSON "no trailing
  content after the root value" check, then dispatches into
  networkpolicy.Global().PutContext. Validation errors are mapped to
  the architecture §5.4.8 error-code vocabulary by classifyPutError,
  which uses errors.Is against the sentinel chain that context.go
  wraps every validation error with — so first-emitted-sentinel wins
  on composite failure bodies (covered by the
  BadMAC_EmptyGatewayIP_RoutesToInvalidField and
  BadVersion_TooManyInterfaces_RoutesToTooMany tests). Unknown errors
  (no recognized sentinel) surface as 5xx internal_error so host's
  retry/backoff policy kicks in — architecture §5.4.8 explicitly
  treats 5xx as transient.
- deleteNetworkContext always returns 204 (idempotent per §5.4.3);
  the state machine's source / last_matched / selected proxy are
  preserved.
- getNetworkContext returns 200 with a uniform schema even when no
  manager / no context is installed, so polling clients don't have
  to branch on 200-vs-404.
- Wire encoding for matched_network / last_matched_network follows
  §5.6.4: both the MatchedNone sentinel and the nil (never-evaluated)
  state serialize to JSON null; host disambiguates via
  selection_source. TTL is stripped from the GET echo per §5.4.4.
- Wire types (appliedRowWire / putWire / statusGroupWire / statusWire)
  are package-level so the PUT / GET / no-ctx paths share one source
  of truth for the JSON tags.

component/networkpolicy/context.go:
- Introduces the seven sentinel errors (ErrMalformedBody,
  ErrInvalidVersion, ErrInvalidTTL, ErrTooManyInterfaces,
  ErrDuplicateIfaceName, ErrInvalidGatewayCombo, ErrInvalidField) and
  wraps every validation emission site with fmt.Errorf("%w: ...",
  ErrXxx). REST handlers route on these via errors.Is.
- invalidField(path, reason) / invalidGatewayCombo(path) helpers
  enforce the §5.4.8 "field: <path>, reason: <why>" message format
  uniformly. withIfacePrefix stamps the "interfaces[N]." prefix using
  the input-order index (pre-sort) so host can locate the bad iface
  by the position they sent.
- Per-iface validate now runs BEFORE the canonical sort (reorder
  vs. the skeleton) so every invalid_field / invalid_gateway_combo
  error carries the input-order iface index. Duplicate-name detection
  still happens post-sort but reports the name rather than a position
  that would be misleading.

hub/route/server.go: mounts /network/context alongside /dns and
/storage in the authenticated router group.

Tests (hub/route/network_test.go, ~20 tests): happy PUT / matched /
default / wire-encoding MatchedNone → null; every error code mapping
(malformed_body, missing version, invalid_version, invalid_ttl,
too_many_interfaces, duplicate_iface_name, invalid_gateway_combo,
invalid_field for bad MAC / bad iface_type, plus the composite
routing tests); message-format regression locking the "field: ...,
reason: ..." contract; trailing-JSON-junk rejection; oversize-body
rejection (12 MiB); 1.5 MiB schema-legal body acceptance; DELETE
happy + no-manager + preserves-groups-after-ctx-clear; GET happy +
no-manager.

context_test.go: assertion substrings updated to match the sentinel
message format ("invalid_version" / "field: interfaces[0].ssid" /
etc.) and a new regression covering input-order-index error paths.

Subsequent commits on this branch wire the manual-PUT hook
(/proxies/:name → HandleManualSet) and emit the external-controller
security warning.
Architecture §5.6.2 row 1: a user's manual pick via
PUT /proxies/:name on a network-policy-governed group must flip that
group's selection_source to `manual` so the next PUT
/network/context respects the pick when the network hasn't changed
(manual_locked), while still allowing auto to take over on a network
transition.

hub/route/proxies.go::updateProxy: after the existing selector.Set()
succeeds, if the proxy's adapter satisfies
networkpolicy.SelectorWithPolicy AND carries a non-empty
network-policy, call networkpolicy.Global().HandleManualSet(name).

The hook runs AFTER Set() returns so no Selector-internal lock is
held when acquiring the manager's globalMu → m.mu chain. This honors
the lock order documented at the top of install.go
(globalMu → sel.mu): concurrent Install is free to take sel.mu during
inheritFrom's selected-migration step because the REST handler
never bridges the two.

Groups without a network-policy keep the legacy behavior unchanged —
the existing cachefile.SetSelected + SwitchProxiesCallback already
handle them.
Architecture §5.2.1: when the user enables network-policy, a
non-loopback external-controller endpoint without a `secret:` (or
without strong mutual-TLS on the TLS variant) lets anyone on the
network manipulate proxy selection via PUT /network/context. This is
informational — mihomo doesn't block or reconfigure; the warning just
makes the risk visible at startup.

hub/executor/network_policy_warn.go:
- warnNetworkPolicyExternalController fires when (a) at least one
  select group has a non-empty network-policy AND (b) the TCP
  external-controller is bound to a non-loopback address AND (c) no
  Secret is set AND (d) the endpoint isn't protected by strong mTLS.
- Strong mTLS exemption requires BOTH a client-auth-cert AND
  ClientAuthType == "require-and-verify" (case-insensitive). Weaker
  auth modes (`request` / `verify-if-given` / `require-any`) do NOT
  exempt — they don't actually reject unauthorized clients.
- Loopback detection uses a static allowlist of well-known hostname
  aliases (localhost / ip6-localhost / ip6-loopback) plus IP literal
  checks; unknown hostnames are conservatively treated as exposed.
  No DNS lookup at startup.
- Unix-socket and named-pipe bindings are intentionally out of scope
  — their access control is a filesystem-permissions concern, not a
  network one.

hub/executor/executor.go: ApplyConfig's updateNetworkPolicy now ends
with a call to warnNetworkPolicyExternalController so the check runs
in the same config-apply dispatch the feature itself is installed
in.

Tests (hub/executor/network_policy_warn_test.go):
- isNonLoopbackBind across IPv4/IPv6 loopback, wildcard binds, LAN
  addresses, hostname aliases (including case-insensitive match),
  and unparseable inputs.
- hasStrongClientAuth across nil / empty / all ClientAuthType values
  including `require-any` as an explicit non-exempt reference.

Independent of the REST handler and manual-PUT hook: the warning only
reads the config and iterates cfg.Proxies for policy groups, so it
could be landed before or after the other M4 pieces. Kept here as a
standalone commit to keep the security-oriented change reviewable
in isolation.
Holistic review surfaced that `component/networkpolicy/` exports a
larger surface than any caller actually needs. Cross-package
consumers (`adapter/outboundgroup`, `config`, `hub/executor`,
`hub/route`, plus their tests) only reference 14 names; the remaining
exported identifiers are package-internal detail that leaked through
the exported boundary when the package was built up commit-by-commit.

Shrunk to unexported:
- MaxTTLSeconds → maxTTLSeconds (TTL ceiling used only inside manager)
- DefaultBarrierTimeout → defaultBarrierTimeout (startup-barrier tuning)
- PersistVersion → persistVersion (bucket schema tag; cachefile consumes
  opaque bytes and never references the integer directly)
- IsValidIfaceType → isValidIfaceType (validation helper; only
  context.go validate() calls it)
- PersistedState → persistedState (whole bucket record; REST layer
  never references the Go type)
- PersistedState.Validate → .validate (method follows type)
- PersistedState.MarshalValidated → .marshalValidated (ditto; paired
  with writeNetworkPolicyState)
- WriteNetworkPolicyState → writeNetworkPolicyState (internal write
  helper consumed by manager.persistStateLocked)
- Manager.OnCandidateSetDirty → .onCandidateSetDirty (architecture
  §5.6.3 reserved hook for provider-refresh callers; none exist in
  this PR's scope, so keep it as internal until real external callers
  land — leaving an exported-but-unused API invites upstream reviewer
  to flag PR completeness)

Kept exported (real cross-package consumers identified via grep):
  GroupPolicy, GroupSource, ParseGroupPolicy, Network, ParseNetworks,
  Install, Uninstall, SelectorWithPolicy, Global, NetworkContext,
  PutResult, StatusResult, GroupStatus, ApplyResult, MatchedNone,
  DefaultKey, Matcher, ParseMatch, Match, NewManager, MaxInterfaces,
  ReasonXxx (6), SourceXxx (3), ErrXxx (7).

MarshalJSON / UnmarshalJSON on persistedState remain exported — that's
required for encoding/json to find them via the json.Marshaler /
json.Unmarshaler interfaces; Go's visibility rules are on identifiers,
not on enclosing types.

Also tightened two cross-package type assertions flagged by the same
review to use the SelectorWithPolicy interface rather than a concrete
*outboundgroup.Selector:
  - hub/executor/executor.go::collectNetworkPolicySelectors
  - config/config.go::warnOrphanNetworks

The concrete-type assertion was redundant — SelectorWithPolicy already
captured everything both call sites needed (Name / NetworkPolicy), and
matches the style already used by network_policy_warn.go's
hasAnyNetworkPolicyGroup. This also lets a future Fallback / URLTest
implementing SelectorWithPolicy be picked up without an executor-side
code change.

Pure refactor — no behavior change. Builds and all test suites remain
clean.
- docs/config.yaml grows a comment-only networks: + network-policy:
  example showing the multi-interface match grammar (ssid / bssid /
  iface-type / gateway-ip / gateway-mac / dns-suffix / subnets,
  with any: / all: / not: combinators and the same-iface atomic AND
  rule), a select-type proxy-group with a Mapping including the
  reserved default key, and a short summary of the manual-wins
  state machine (with the missing_target carve-out) and the
  YAML-vs-wire unknown-field policy.
- docs/api.md is a new reference covering every route registered in
  hub/route/ (proxies, groups, rules, providers, connections, DNS,
  cache, storage, logs/traffic/memory/connections websockets,
  configs, restart/upgrade, GeoX, UI, version/hello, debug) and
  the /network/context contract in full: the NetworkContext wire
  schema (interfaces[] + global dns_suffix, tri-state metered,
  ttl), the PutResponse applied[] shape with the seven-reason
  enumeration, the manual-wins state machine (including the
  matched-changed-but-missing_target branch), the Status groups[]
  output with tri-state wire-null encoding for matched_network /
  last_matched_network, and the full error-code matrix
  (malformed_body / invalid_version / invalid_ttl /
  too_many_interfaces / duplicate_iface_name / invalid_field /
  invalid_gateway_combo / internal_error) plus the
  {"code","message"} envelope used only by /network/context.
- Also documents GET /configs.tun.device as part of the
  host-sampler listener contract.
sing_tun.New writes the auto-resolved TUN name back to options.Device
on its two "Device needs auto-detect" paths already (the empty-Device
path and the !checkTunName path, both reached when FileDescriptor ==
0). The fd-override path, which runs getTunnelName(fd) to discover
the real iface name behind a caller-supplied file descriptor, updated
only a local variable — so *Listener.Config() kept echoing the
caller's original (possibly empty) Device on that path.

Mirror the resolved name into options.Device there as well, so
Listener.Config() reflects the bound interface on every path. This
value flows unchanged to GET /configs via listener.GetTunConf() ->
tunLister.Config() while TUN is running.

Host tools that rely on /configs.tun.device for "filter mihomo's own
TUN" decisions — e.g. agents that push the active interface set to
the core via PUT /network/context — no longer need to fall back to
name-prefix heuristics when the user doesn't explicitly set
tun.device.

Note: on macOS the resolved utun name is still best-effort — it is
derived from CalculateInterfaceName()'s "smallest unused utunN" scan
before the underlying tun device is opened, so a racing allocation
can leave Config().Device reporting the pre-bind estimate. Tightening
this to a true post-bind read-back is outside the scope of this
commit.
docs/config.yaml 原先指向 docs/network-policy.md 但该文件未纳入 PR;
改为指向 docs/api.md §11(与 PR 其余章节已有的指向一致)。
@wangwei354 wangwei354 force-pushed the feature/network-policy branch from 8e415a6 to 6670d21 Compare May 20, 2026 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant