BayesPE integration by rvz16 · Pull Request #409 · IINemo/lm-polygraph

rvz16 · 2025-12-14T20:05:22Z

Add implementation of Bayesian Prompt Ensembles: Model Uncertainty Estimation for Black-Box Large Language Models (https://aclanthology.org/2024.findings-acl.728.pdf)

ArtemVazh

Please rebase the branch onto the latest main. Additionally, please add these methods to the README and the default_estimators config.

ArtemVazh · 2025-12-26T17:09:03Z

            )

    elif model_type == "Whitebox":
+        from lm_polygraph.stat_calculators.ensemble_probs import EnsembleProbsCalculator


EnsembleProbsCalculator should be specified in the __init__ of lm_polygraph.stat_calculators and removed from here.

ArtemVazh · 2025-12-26T17:14:53Z

    get_linear_schedule_with_warmup,
 )
+
+try:  # Transformers >=4.46 removes AdamW from top-level


This issue has already been resolved in main. from torch.optim import AdamW should work with the package versions in requirements.

ArtemVazh · 2025-12-26T17:16:47Z

+        return (["ensemble_probs"], ["input_texts"])
+
+
+def load_stat_calculator(config, environment):


It should be placed in a separate file, as is done for all other calculators here: lm_polygraph/defaults/stat_calculator_builders

ArtemVazh · 2025-12-26T17:17:47Z

        return result_dict
+
+
+def load_stat_calculator(config, environment):


Is this needed here? It doesn’t seem to be used.

ArtemVazh · 2025-12-26T17:51:01Z

-            )
+            if not (
+                str(s).startswith("blackbox_") and s[len("blackbox_") :] in have_stats
+            )  # remove blackbox_X from stats only if X is already in stats to remove duplicated run of stat calculator


It’s unclear why this modification is necessary.

ArtemVazh · 2025-12-26T17:52:08Z

+from lm_polygraph.utils.model import WhiteboxModel
+
+
+def test_bayespe_zero_shot_end_to_end():


This test should be moved to test/local.

ArtemVazh · 2025-12-26T17:52:34Z

    estimator = Focus(
        model_name=model_name,
-        path="../token_idf/{model_name}/token_idf.pkl",
+        path=f"../focus_data/{model_name}/token_idf.pkl",


It’s unclear why this modification is necessary.

ArtemVazh · 2025-12-26T17:55:13Z

    assert isinstance(ue.uncertainty, float)
+
+
+def test_bayespe_zero_shot():


It would be better to refactor the test_bayespe_zero_shot and test_bayespe_few_shot tests to use the standard estimate_uncertainty function, while keeping these tests as additional ones.

ArtemVazh · 2026-01-07T07:48:37Z

@rvz16 Hi! Thank you for addressing most of the requests. However, could you please clarify a few remaining modifications?

Do we really need the load_stat_calculator function in src/lm_polygraph/stat_calculators/greedy_probs.py?
Why was the condition at line 225 in src/lm_polygraph/utils/manager.py changed?

Could you also fix the lint issues, please?

…condition in manager.py and fixed lint issue

IINemo assigned ArtemVazh Dec 18, 2025

ArtemVazh requested changes Dec 26, 2025

View reviewed changes

rvz16 added 10 commits December 30, 2025 00:15

Added BayesPE base version

c69b1d6

Refactored code for BayesPE + tests

8b35ac6

Changed methods and refactored code

71e6887

Changed library and tests

00b2fe8

Changed comments

00c3bdd

Added BayesPE base version

55dfee5

Refactored code for BayesPE + tests

a99b0b8

Changed methods and refactored code

38f42ed

feat: added bayespe integration

99329be

feat: fleak8 changes + ensemble scaler fixed

33e29ba

rvz16 force-pushed the bayespe branch from 2ba8756 to 33e29ba Compare December 29, 2025 22:18

feat: add requested to PR changes

9b909c1

rvz16 requested a review from ArtemVazh December 30, 2025 00:32

feat: deleted load_stat_calculator in greedy_probs.py, replaced back …

38df5b7

…condition in manager.py and fixed lint issue

ArtemVazh approved these changes Jan 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BayesPE integration#409

BayesPE integration#409
rvz16 wants to merge 12 commits into
IINemo:mainfrom
rvz16:bayespe

rvz16 commented Dec 14, 2025

Uh oh!

ArtemVazh left a comment

Uh oh!

ArtemVazh Dec 26, 2025

Uh oh!

Uh oh!

ArtemVazh Dec 26, 2025

Uh oh!

ArtemVazh Dec 26, 2025

Uh oh!

ArtemVazh Dec 26, 2025

Uh oh!

ArtemVazh Dec 26, 2025

Uh oh!

Uh oh!

ArtemVazh Dec 26, 2025

Uh oh!

ArtemVazh Dec 26, 2025

Uh oh!

ArtemVazh Dec 26, 2025

Uh oh!

ArtemVazh commented Jan 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return (["ensemble_probs"], ["input_texts"])


		def load_stat_calculator(config, environment):

		return result_dict


		def load_stat_calculator(config, environment):

		from lm_polygraph.utils.model import WhiteboxModel


		def test_bayespe_zero_shot_end_to_end():

		assert isinstance(ue.uncertainty, float)


		def test_bayespe_zero_shot():

Conversation

rvz16 commented Dec 14, 2025

Uh oh!

ArtemVazh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArtemVazh commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ArtemVazh commented Jan 7, 2026 •

edited

Loading