Hi,
While integrating SkillOpt into our task, we ran into some confusion between the paper description and the current implementation. A few questions:
- Slow Update field — is it intended to ship with the target model?
In best_skill.md, the block contains concrete classification rules (e.g., "price list with pet fee → difficult_positive"). The current code in rollout.py's _build_system() passes the full skill content including this block to the target model verbatim. Is this the intended behavior?
The paper says the slow update writes "longitudinal guidance" that passes through the validation gate — which implies it should be deployed with the skill, but the wording "guidance block" was ambiguous to us at first.
- Meta Skill — is it implemented?
The paper describes a second mechanism: "The meta skill is optimizer-side only. It summarizes which edit patterns helped, which were rejected, and which failures persisted across epochs. This meta guidance is prepended to future optimizer prompts."
We don't see this implemented in the current codebase. The optimizer prompt (slow_update prompt) does receive the previous slow update content (prev_slow_update_content), but there's no separate accumulating meta record of patch-level edit history across steps/epochs. Is this intentionally deferred, or is there
an implementation we missed?
- Naming suggestion (optional)
The SLOW_UPDATE_START/END markers currently hold domain rules that are part of the deployed skill. The name "slow update" emphasizes the process (epoch-wise update), not the content (domain rules). This caused us to initially treat the block as internal optimizer metadata rather than deployable classification
rules. Would renaming the markers (e.g., EPOCH_RULES_START/END) or adding clearer documentation help clarify the distinction?
Thanks
Hi,
While integrating SkillOpt into our task, we ran into some confusion between the paper description and the current implementation. A few questions:
In best_skill.md, the block contains concrete classification rules (e.g., "price list with pet fee → difficult_positive"). The current code in rollout.py's _build_system() passes the full skill content including this block to the target model verbatim. Is this the intended behavior?
The paper says the slow update writes "longitudinal guidance" that passes through the validation gate — which implies it should be deployed with the skill, but the wording "guidance block" was ambiguous to us at first.
The paper describes a second mechanism: "The meta skill is optimizer-side only. It summarizes which edit patterns helped, which were rejected, and which failures persisted across epochs. This meta guidance is prepended to future optimizer prompts."
We don't see this implemented in the current codebase. The optimizer prompt (slow_update prompt) does receive the previous slow update content (prev_slow_update_content), but there's no separate accumulating meta record of patch-level edit history across steps/epochs. Is this intentionally deferred, or is there
an implementation we missed?
The SLOW_UPDATE_START/END markers currently hold domain rules that are part of the deployed skill. The name "slow update" emphasizes the process (epoch-wise update), not the content (domain rules). This caused us to initially treat the block as internal optimizer metadata rather than deployable classification
rules. Would renaming the markers (e.g., EPOCH_RULES_START/END) or adding clearer documentation help clarify the distinction?
Thanks