Support metric_direction: lower in program frontmatter (lower-is-better programs)

## Summary

Autoloop currently assumes **higher is better** everywhere — in the `best_metric` comparison in the scheduler, in the "metric improved" check in the iteration loop, in the iteration-history delta formatting, and implicitly in the halting-condition rule for programs with a `target-metric`. Programs whose natural fitness is *lower is better* (minimize ratio / error / latency / cost / fitness score) currently have to invert their metric in the `Evaluation` block, which makes the value hard to read in iteration comments and inverts the semantics of `target-metric` in unintuitive ways.

Add first-class support for `metric_direction: lower` in the program frontmatter, and thread it through everywhere a metric comparison or delta is computed.

## Motivation

- **OpenEvolve programs** (proposed in sibling issue #47) typically minimize a fitness ratio (candidate / reference — lower means our candidate beats the reference). Wanting to `best_metric = min(history)` is natural; having to negate + remember the sign adds cognitive overhead.
- **Latency / cost / error / bundle-size optimization programs** are all lower-is-better. Users shouldn't have to invert their metric to use autoloop.
- `target-metric` becomes confusing under inversion: a program targeting "reach ratio ≤ 0.9" would currently have to encode the target as `-0.9` or `1/0.9`, with cryptic comparison logic.

Today every user who wants lower-is-better has to either invert in their `Evaluation` script (ugly) or read the iteration comments "backwards" (confusing).

## Proposed changes

### 1. Frontmatter field

```yaml
---
schedule: every 6h
metric_direction: lower   # defaults to "higher" if omitted
target-metric: 0.9        # interpreted as "program is complete when best_metric ≤ 0.9"
---
```

Values: `higher` (default, current behaviour) or `lower`. Reject anything else at frontmatter-parse time.

### 2. Scheduler (`workflows/scripts/autoloop_scheduler.py`)

- `parse_program_frontmatter` already parses `schedule` and `target-metric`; extend it to return `metric_direction`.
- Plumb through `parse_program_frontmatter`'s callers, into `all_programs[name]` + `/tmp/gh-aw/autoloop.json`.
- Emit a new field in `autoloop.json`:

  ```json
  "selected_metric_direction": "lower"
  ```

### 3. Agent prompt (`workflows/autoloop.md`)

Three places need direction-aware logic:

#### a. The "metric improved" check in Step 5 (Accept or Reject)

```diff
-**If the metric improved** (or this is the first run establishing a baseline):
+**If the metric improved** (or this is the first run establishing a baseline).
+Improvement is direction-aware:
+- If `metric_direction` is `higher` (default): improved = `new > best_metric`.
+- If `metric_direction` is `lower`: improved = `new < best_metric`.
+Read `selected_metric_direction` from `/tmp/gh-aw/autoloop.json` to know which.
```

#### b. The `best_metric` update in the state file

Currently "set `best_metric`" assumes replace-if-higher. Make it "set `best_metric` to the new value" (since improvement was already validated above), and separately instruct the state-file reader in the pre-step to know which direction determines "overdue" ranking. The scheduler comparison that picks the most-overdue program doesn't depend on metric direction, but the display delta does.

#### c. The Iteration History delta formatting

```diff
-Prepend an entry to **📊 Iteration History** (newest first) with status ✅, metric, PR link, the fix-attempt count if `> 0`, and a one-line summary…
+Prepend an entry to **📊 Iteration History** (newest first) with status ✅, metric, **signed delta** (`+N` for `higher`-direction programs, `-N` for `lower`-direction programs; both are "improvement" arrows), PR link, the fix-attempt count if `> 0`, and a one-line summary…
```

#### d. Halting condition

```diff
-If the program has a `target-metric` in its frontmatter and the new `best_metric` meets or surpasses the target, mark the program as completed.
+If the program has a `target-metric` in its frontmatter:
+- `metric_direction: higher`: completed when `best_metric >= target-metric`.
+- `metric_direction: lower`: completed when `best_metric <= target-metric`.
+Mark the program as completed (set `Completed: true`, remove the `autoloop-program` label, add `autoloop-completed`).
```

### 4. Machine State table

Add a row:

```markdown
| Metric Direction | lower |
```

For backward compatibility, if the row is absent, treat the program as `higher`. On the first iteration after this change lands for an existing program, the agent adds the row with the value from frontmatter (or `higher` if absent).

### 5. Tests

In `tests/`, add fixtures for `metric_direction: lower`:

- Given `best_metric = 1.5` and new metric `1.3` with `metric_direction: lower`, improvement returns `true`.
- Given `best_metric = 1.5` and new metric `1.7` with `metric_direction: lower`, improvement returns `false`.
- Given `target-metric: 0.9` and `best_metric: 0.85` with `metric_direction: lower`, halting condition returns `true`.
- Default (direction omitted) behaves as `higher` exactly as today — no regression for existing programs.

## Backward compatibility

- Programs without the field default to `higher` — no change in behaviour.
- Machine State row is optional; absence is treated as `higher`.
- No migration needed for existing state files.

## Acceptance

- A new program with `metric_direction: lower` in frontmatter has its `best_metric` ratchet downward, and iteration comments show `-<delta>` as improvement.
- A program with `target-metric: 0.9` and `metric_direction: lower` completes when `best_metric` reaches 0.9 or below.
- All existing programs (implicit `higher`) keep working identically.
- Tests for both directions land in `tests/` and pass.

## Related

- Sibling #47 (Strategy system + OpenEvolve) — OpenEvolve programs are the primary consumer of this field.
- Sibling #48 (Test-Driven) — may also want `lower` direction for programs whose metric is "failing tests count" or "lint violations count"; should work out of the box once this lands.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support metric_direction: lower in program frontmatter (lower-is-better programs) #49

Summary

Motivation

Proposed changes

1. Frontmatter field

2. Scheduler (`workflows/scripts/autoloop_scheduler.py`)

3. Agent prompt (`workflows/autoloop.md`)

a. The "metric improved" check in Step 5 (Accept or Reject)

b. The `best_metric` update in the state file

c. The Iteration History delta formatting

d. Halting condition

4. Machine State table

5. Tests

Backward compatibility

Acceptance

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Support metric_direction: lower in program frontmatter (lower-is-better programs) #49

Description

Summary

Motivation

Proposed changes

1. Frontmatter field

2. Scheduler (workflows/scripts/autoloop_scheduler.py)

3. Agent prompt (workflows/autoloop.md)

a. The "metric improved" check in Step 5 (Accept or Reject)

b. The best_metric update in the state file

c. The Iteration History delta formatting

d. Halting condition

4. Machine State table

5. Tests

Backward compatibility

Acceptance

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

2. Scheduler (`workflows/scripts/autoloop_scheduler.py`)

3. Agent prompt (`workflows/autoloop.md`)

b. The `best_metric` update in the state file