Skip to content

fix(estimators): negate RenyiNeg and FisherRao to match uncertainty orientation#459

Open
Yedson54 wants to merge 1 commit into
IINemo:mainfrom
Yedson54:fix/fisherrao-renyineg-orientation
Open

fix(estimators): negate RenyiNeg and FisherRao to match uncertainty orientation#459
Yedson54 wants to merge 1 commit into
IINemo:mainfrom
Yedson54:fix/fisherrao-renyineg-orientation

Conversation

@Yedson54

@Yedson54 Yedson54 commented May 19, 2026

Copy link
Copy Markdown

Summary

RenyiNeg and FisherRao currently compute a token-level divergence/distance between the predictive distribution $p_t$ and the uniform distribution $u$, then average it over generated tokens.

For RenyiNeg: $$U_{\mathrm{RenyiNeg}}(x) = \frac{1}{T} \sum_{t=1}^{T} D_\alpha(p_t \Vert u)$$ (missing negative sign)

For FisherRao: $$U_{\mathrm{FisherRao}}(x) = \frac{1}{T} \sum_{t=1}^{T} FR(p_t, u)$$ (missing negative sign)

Both quantities increase when $p_t$ becomes more peaked and decrease as $p_t$ approaches the uniform distribution. The returned values therefore appear to behave as certainty/confidence scores rather than uncertainty scores.

From my understanding, most other uncertainty estimators and evaluation protocols in LM-Polygraph seem to follow the convention: $$\text{higher score} \Rightarrow \text{higher uncertainty}.$$

To keep the score orientation consistent with the rest of the library, I negate the returned values of RenyiNeg and FisherRao. The underlying divergence/distance computations are unchanged. Only the returned score orientation is modified. Docstrings are updated accordingly.

Breaking change [semantic; downstream usage; not functional]

Returned scores change sign.

Existing benchmark results, stored predictions, or downstream analyses using these estimators may therefore need to:

  • negate previously computed scores,

Test plan

  • flake8 src/lm_polygraph/estimators/renyi_neg.py src/lm_polygraph/estimators/fisher_rao.py
  • pytest --ignore=test/local

…core convention

RenyiNeg and FisherRao compute a divergence/distance to the uniform
token distribution. The raw quantity increases when token distributions
are sharper, so it behaves as a certainty score.

Most LM-Polygraph uncertainty estimators use the opposite convention,
where higher scores indicate higher uncertainty. Negate the returned
scores and clarify the convention in the estimator docstrings.

BREAKING CHANGE: RenyiNeg and FisherRao scores change sign. Existing
results using these estimators should be reinterpreted accordingly or
recomputed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant