Skip to content

Support metric_direction: lower in program frontmatter (lower-is-better programs) #49

@mrjf

Description

@mrjf

Summary

Autoloop currently assumes higher is better everywhere — in the best_metric comparison in the scheduler, in the "metric improved" check in the iteration loop, in the iteration-history delta formatting, and implicitly in the halting-condition rule for programs with a target-metric. Programs whose natural fitness is lower is better (minimize ratio / error / latency / cost / fitness score) currently have to invert their metric in the Evaluation block, which makes the value hard to read in iteration comments and inverts the semantics of target-metric in unintuitive ways.

Add first-class support for metric_direction: lower in the program frontmatter, and thread it through everywhere a metric comparison or delta is computed.

Motivation

  • OpenEvolve programs (proposed in sibling issue Add a strategy system; ship OpenEvolve as the first specialized iteration playbook #47) typically minimize a fitness ratio (candidate / reference — lower means our candidate beats the reference). Wanting to best_metric = min(history) is natural; having to negate + remember the sign adds cognitive overhead.
  • Latency / cost / error / bundle-size optimization programs are all lower-is-better. Users shouldn't have to invert their metric to use autoloop.
  • target-metric becomes confusing under inversion: a program targeting "reach ratio ≤ 0.9" would currently have to encode the target as -0.9 or 1/0.9, with cryptic comparison logic.

Today every user who wants lower-is-better has to either invert in their Evaluation script (ugly) or read the iteration comments "backwards" (confusing).

Proposed changes

1. Frontmatter field

---
schedule: every 6h
metric_direction: lower   # defaults to "higher" if omitted
target-metric: 0.9        # interpreted as "program is complete when best_metric ≤ 0.9"
---

Values: higher (default, current behaviour) or lower. Reject anything else at frontmatter-parse time.

2. Scheduler (workflows/scripts/autoloop_scheduler.py)

  • parse_program_frontmatter already parses schedule and target-metric; extend it to return metric_direction.

  • Plumb through parse_program_frontmatter's callers, into all_programs[name] + /tmp/gh-aw/autoloop.json.

  • Emit a new field in autoloop.json:

    "selected_metric_direction": "lower"

3. Agent prompt (workflows/autoloop.md)

Three places need direction-aware logic:

a. The "metric improved" check in Step 5 (Accept or Reject)

-**If the metric improved** (or this is the first run establishing a baseline):
+**If the metric improved** (or this is the first run establishing a baseline).
+Improvement is direction-aware:
+- If `metric_direction` is `higher` (default): improved = `new > best_metric`.
+- If `metric_direction` is `lower`: improved = `new < best_metric`.
+Read `selected_metric_direction` from `/tmp/gh-aw/autoloop.json` to know which.

b. The best_metric update in the state file

Currently "set best_metric" assumes replace-if-higher. Make it "set best_metric to the new value" (since improvement was already validated above), and separately instruct the state-file reader in the pre-step to know which direction determines "overdue" ranking. The scheduler comparison that picks the most-overdue program doesn't depend on metric direction, but the display delta does.

c. The Iteration History delta formatting

-Prepend an entry to **📊 Iteration History** (newest first) with status ✅, metric, PR link, the fix-attempt count if `> 0`, and a one-line summary…
+Prepend an entry to **📊 Iteration History** (newest first) with status ✅, metric, **signed delta** (`+N` for `higher`-direction programs, `-N` for `lower`-direction programs; both are "improvement" arrows), PR link, the fix-attempt count if `> 0`, and a one-line summary…

d. Halting condition

-If the program has a `target-metric` in its frontmatter and the new `best_metric` meets or surpasses the target, mark the program as completed.
+If the program has a `target-metric` in its frontmatter:
+- `metric_direction: higher`: completed when `best_metric >= target-metric`.
+- `metric_direction: lower`: completed when `best_metric <= target-metric`.
+Mark the program as completed (set `Completed: true`, remove the `autoloop-program` label, add `autoloop-completed`).

4. Machine State table

Add a row:

| Metric Direction | lower |

For backward compatibility, if the row is absent, treat the program as higher. On the first iteration after this change lands for an existing program, the agent adds the row with the value from frontmatter (or higher if absent).

5. Tests

In tests/, add fixtures for metric_direction: lower:

  • Given best_metric = 1.5 and new metric 1.3 with metric_direction: lower, improvement returns true.
  • Given best_metric = 1.5 and new metric 1.7 with metric_direction: lower, improvement returns false.
  • Given target-metric: 0.9 and best_metric: 0.85 with metric_direction: lower, halting condition returns true.
  • Default (direction omitted) behaves as higher exactly as today — no regression for existing programs.

Backward compatibility

  • Programs without the field default to higher — no change in behaviour.
  • Machine State row is optional; absence is treated as higher.
  • No migration needed for existing state files.

Acceptance

  • A new program with metric_direction: lower in frontmatter has its best_metric ratchet downward, and iteration comments show -<delta> as improvement.
  • A program with target-metric: 0.9 and metric_direction: lower completes when best_metric reaches 0.9 or below.
  • All existing programs (implicit higher) keep working identically.
  • Tests for both directions land in tests/ and pass.

Related

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions