docs: per-object state metrics#3104
Conversation
Signed-off-by: ianisimov <[email protected]>
|
🌿 Preview your docs: https://nvidia-preview-pull-request-3104.docs.buildwithfern.com/infra-controller |
Summary by CodeRabbit
WalkthroughThis PR adds a design document (docs/design/per-object-state-metrics.md) proposing per-object state progress metrics for a state-controller system. It defines a new gauge-only Prometheus metric catalog, an opt-in dedicated endpoint, cardinality controls, query examples, non-goals, and an implementation approach. ChangesPer-object state metrics design proposal
Estimated code review effort: 1 (Trivial) | ~5 minutes Related Issues: Not specified in the provided diff. Related PRs: Not specified in the provided diff. Suggested labels: documentation, design-proposal Suggested reviewers: Not specified in the provided diff. Poem: 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
🔍 Container Scan SummaryNo Grype artifacts were found to aggregate. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
docs/design/per-object-state-metrics.md (1)
222-248: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winCall out the registry API break explicitly.
PerObjectMetricsRegistryis currently classification-specific and registers one hard-coded gauge on the supplied meter. The implementation plan should say this is a breaking expansion, not a drop-in reuse, so the existing health metric and the new state/_info gauges stay clearly separated.Suggested text change
- We extend it rather than adding a sibling: + We extend it rather than adding a sibling, but this is a breaking API expansion: the current classification-only gauge stays intact while new handles are added for state and `_info` metrics:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/design/per-object-state-metrics.md` around lines 222 - 248, The plan should explicitly state that reusing PerObjectMetricsRegistry is a breaking API expansion, not a drop-in reuse, because it is currently classification-specific and only registers one hard-coded gauge. Update the description around PerObjectMetricsRegistry, gauge(name, description), and the processor.rs hookup to call out that the registry shape and handle API must change, while keeping the existing health metric separate from the new state and _info gauges on the per-object meter.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/design/per-object-state-metrics.md`:
- Around line 119-135: The `carbide_machine_instance_info` example currently
includes free-form `tenant_org`, which violates the bounded-label rule for
`_info` metrics. Update the documentation around the `carbide_object_info` and
`carbide_machine_instance_info` examples to remove the raw tenant string and, if
tenant attribution is needed, describe using a stable internal tenant ID/code
instead; keep the label set limited to closed-set traits plus object IDs.
- Around line 55-59: Qualify the scrape-interval guidance in the design doc so
it only applies to the state-transition series, since the `_info` and
association metrics can still change on inventory or topology updates. Update
the wording around the scrape advice near the `run.rs::create_metrics()`
discussion to explicitly distinguish the slow-changing state series from the
freshness-sensitive metrics, and avoid implying that all of the new metrics can
safely be scraped every 60–120s.
---
Nitpick comments:
In `@docs/design/per-object-state-metrics.md`:
- Around line 222-248: The plan should explicitly state that reusing
PerObjectMetricsRegistry is a breaking API expansion, not a drop-in reuse,
because it is currently classification-specific and only registers one
hard-coded gauge. Update the description around PerObjectMetricsRegistry,
gauge(name, description), and the processor.rs hookup to call out that the
registry shape and handle API must change, while keeping the existing health
metric separate from the new state and _info gauges on the per-object meter.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 10f87234-fdf5-450f-b024-f03862a708d3
📒 Files selected for processing (1)
docs/design/per-object-state-metrics.md
Design describing #2186 implementation.
https://github.com/yoks/bare-metal-manager-core/blob/d7497715d555d0bdc120766a8ebda0c77d56ffb0/docs/design/per-object-state-metrics.md
Related issues
#2186
Type of Change
Breaking Changes
Testing
Additional Notes