Skip to content

Terminal bench 2 harness#145

Open
lekt9 wants to merge 3 commits into
justrach:mainfrom
lekt9:terminal-bench-2-harness
Open

Terminal bench 2 harness#145
lekt9 wants to merge 3 commits into
justrach:mainfrom
lekt9:terminal-bench-2-harness

Conversation

@lekt9
Copy link
Copy Markdown

@lekt9 lekt9 commented May 30, 2026

hi rach

lekt9 and others added 3 commits May 31, 2026 00:49
A Replace/ReplaceAll whose replacement is identical to the matched text
produced no change yet reported success, so the caller could consume a
turn believing an edit was applied. Return an explicit NoOpReplace error
instead. Adds in-file unit tests.

Co-Authored-By: blackfloofie-a codegraff agent <[email protected]>

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Memoize semantic-search results under a key whose hash folds in the
workspace index version (node_count + last_updated), so a re-indexed
workspace busts the cache instead of serving a stale result. Reuses the
existing KVStore; cache failures never break search. Adds unit tests for
key determinism and index-change invalidation.

Co-Authored-By: blackfloofie-a codegraff agent <[email protected]>

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Adds benchmarks/terminal_bench: a Harbor BaseInstalledAgent that installs
the released graff binary in each task container, injects host provider
credentials, and runs 'graff --prompt' headlessly so codegraff can be
evaluated on Terminal-Bench 2.0. Includes run.sh and README.

Co-Authored-By: blackfloofie-a codegraff agent <[email protected]>

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant