Skip to content

BayesPE integration#409

Open
rvz16 wants to merge 12 commits into
IINemo:mainfrom
rvz16:bayespe
Open

BayesPE integration#409
rvz16 wants to merge 12 commits into
IINemo:mainfrom
rvz16:bayespe

Conversation

@rvz16

@rvz16 rvz16 commented Dec 14, 2025

Copy link
Copy Markdown
Contributor

Add implementation of Bayesian Prompt Ensembles: Model Uncertainty Estimation for Black-Box Large Language Models (https://aclanthology.org/2024.findings-acl.728.pdf)

@ArtemVazh ArtemVazh left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase the branch onto the latest main. Additionally, please add these methods to the README and the default_estimators config.

)

elif model_type == "Whitebox":
from lm_polygraph.stat_calculators.ensemble_probs import EnsembleProbsCalculator

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnsembleProbsCalculator should be specified in the __init__ of lm_polygraph.stat_calculators and removed from here.

Comment thread src/lm_polygraph/defaults/register_default_stat_calculators.py
get_linear_schedule_with_warmup,
)

try: # Transformers >=4.46 removes AdamW from top-level

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue has already been resolved in main. from torch.optim import AdamW should work with the package versions in requirements.

return (["ensemble_probs"], ["input_texts"])


def load_stat_calculator(config, environment):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be placed in a separate file, as is done for all other calculators here: lm_polygraph/defaults/stat_calculator_builders

return result_dict


def load_stat_calculator(config, environment):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed here? It doesn’t seem to be used.

Comment thread src/lm_polygraph/utils/manager.py Outdated
)
if not (
str(s).startswith("blackbox_") and s[len("blackbox_") :] in have_stats
) # remove blackbox_X from stats only if X is already in stats to remove duplicated run of stat calculator

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s unclear why this modification is necessary.

Comment thread src/lm_polygraph/utils/model.py
Comment thread test/test_bayespe_integration.py Outdated
from lm_polygraph.utils.model import WhiteboxModel


def test_bayespe_zero_shot_end_to_end():

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should be moved to test/local.

Comment thread test/test_estimators.py Outdated
estimator = Focus(
model_name=model_name,
path="../token_idf/{model_name}/token_idf.pkl",
path=f"../focus_data/{model_name}/token_idf.pkl",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s unclear why this modification is necessary.

Comment thread test/test_estimators.py Outdated
assert isinstance(ue.uncertainty, float)


def test_bayespe_zero_shot():

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to refactor the test_bayespe_zero_shot and test_bayespe_few_shot tests to use the standard estimate_uncertainty function, while keeping these tests as additional ones.

@rvz16 rvz16 requested a review from ArtemVazh December 30, 2025 00:32
@ArtemVazh

ArtemVazh commented Jan 7, 2026

Copy link
Copy Markdown
Collaborator

@rvz16 Hi! Thank you for addressing most of the requests. However, could you please clarify a few remaining modifications?

  1. Do we really need the load_stat_calculator function in src/lm_polygraph/stat_calculators/greedy_probs.py?
  2. Why was the condition at line 225 in src/lm_polygraph/utils/manager.py changed?

Could you also fix the lint issues, please?

…condition in manager.py and fixed lint issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants