Skip to content

fix: normalize weights in functional_correctness_score to present categories#2

Merged
nv78 merged 2 commits into
mainfrom
fix/normalize-functional-correctness-weights
Jun 13, 2026
Merged

fix: normalize weights in functional_correctness_score to present categories#2
nv78 merged 2 commits into
mainfrom
fix/normalize-functional-correctness-weights

Conversation

@sharonxz

@sharonxz sharonxz commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Suites missing one or more test categories were silently capped below 1.0 because missing categories contributed 0 to a fixed-sum weight total. Renormalize to only the categories present so scores remain on a 0-1 scale.

sharonxz and others added 2 commits June 4, 2026 18:13
…egories

Suites missing one or more test categories were silently capped below 1.0
because missing categories contributed 0 to a fixed-sum weight total.
Renormalize to only the categories present so scores remain on a 0-1 scale.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…count

setuptools.backends.legacy:build caused CI install failures on older
setuptools. Replaced with the standard setuptools.build_meta backend.
Also updated test_sample_tasks_length to match the current 10-task fixture.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@nv78 nv78 merged commit e19aa0d into main Jun 13, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants