It's important to know that your tool reduced token usage and doesn't degrade agent performance. One idea is to re-use benchmarks like this:
https://github.com/SWE-rebench/SWE-rebench-V2
I've been using it on a subset of 50 tasks for local benchmarks of Qwen models in Pi. It's relatively cheap or free in my case.
It's important to know that your tool reduced token usage and doesn't degrade agent performance. One idea is to re-use benchmarks like this:
https://github.com/SWE-rebench/SWE-rebench-V2
I've been using it on a subset of 50 tasks for local benchmarks of Qwen models in Pi. It's relatively cheap or free in my case.