Skip to content

resolve_ratio computes average test-pass rate, not % Resolved #6

@darshanmakwana412

Description

@darshanmakwana412

In section 4 of the program bench paper the two metrics are defined as:

  • % Resolved (primary): fraction of instances where all tests pass
  • % Tests Passed (secondary): average fraction of tests passing across instances

But the BatchEvalSummary.resolve_ratio (eval_batch.py:83) function seems to compute the second one:

@computed_field  # type: ignore[prop-decorator]
@property
def resolve_ratio(self) -> float:
    if not self.summaries:
        return 0.0
    return sum(s.score for s in self.summaries) / len(self.summaries)

where s.score is n_resolved / n_tests per instance, so this is the mean test pass rate, not the fraction of fully resolved instances

Is this intentional? If % Resolved is what resolve_ratio is supposed to represent, the correct way to compute it would be:

return sum(1 for s in self.summaries if s.score == 1.0) / len(self.summaries)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions