Skip to content

Prioritized Experience Replay Buffer Support#106

Merged
romerojosh merged 9 commits into
masterfrom
tkurth/rl-prioritized-replay-buffer
Jun 4, 2026
Merged

Prioritized Experience Replay Buffer Support#106
romerojosh merged 9 commits into
masterfrom
tkurth/rl-prioritized-replay-buffer

Conversation

@azrael417
Copy link
Copy Markdown
Collaborator

This MR adds a prioritized replay buffer, which allows the algorithms to assign weights to individual samples according to importance. This seems to be one important ingredient in modern RL training so I thought we might want to support it. It looks invasive but actually all it does is feeding some weights to the training algorithms and updating them accordingly. The interface is universal though, regular replay buffer still works and uses uniform weighting.

@azrael417 azrael417 requested a review from romerojosh June 3, 2026 07:42
@azrael417 azrael417 self-assigned this Jun 3, 2026
Signed-off-by: Thorsten Kurth <[email protected]>
@azrael417 azrael417 force-pushed the tkurth/rl-prioritized-replay-buffer branch from f0bad80 to 3f37ba7 Compare June 3, 2026 07:44
Signed-off-by: Thorsten Kurth <[email protected]>
@azrael417 azrael417 changed the title Tkurth/rl prioritized replay buffer Prioritized Experience Replay Buffer Support Jun 3, 2026
@azrael417
Copy link
Copy Markdown
Collaborator Author

/build_and_test

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

🚀 Build workflow triggered! View run

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

❌ Build workflow failed! View run

@azrael417 azrael417 marked this pull request as draft June 3, 2026 10:15
@azrael417 azrael417 marked this pull request as ready for review June 3, 2026 10:43
@azrael417
Copy link
Copy Markdown
Collaborator Author

/build_and_test

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

🚀 Build workflow triggered! View run

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

✅ Build workflow passed! View run

treeUpdate_(t, p_alpha);
priorities_[t] = raw_p;
max_priority_ = std::max(max_priority_, raw_p);
min_p_alpha_ = std::min(min_p_alpha_, p_alpha);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex doesn't like this handling of min_p_alpha_. In its words:

min_p_alpha_ is a running minimum that only decreases. Once a very low-priority transition is updated or later overwritten, all future IS weights can remain scaled by
that stale value, shrinking critic losses and gradients permanently. This should track the current minimum over active positive priorities, for example with a min-tree or recomputation on update/overwrite.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that seems correct. wow, this is very subtle, but SB and OpenAI use min trees too. Fixed in latest updates. It is simple to maintain since it runs in parallel to the sum tree.

return buffer_[index];
}

bool isReady() const override { return current_size_ >= min_size_; }
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another Codex complaint:

This can report ready before the PER tree has any positive-priority leaf when min_size < nstep. sample() then sees treeTotal_() == 0 and loops forever on zero-priority leaves.
isReady() should require at least one valid priority, or sample() should fail fast on total <= 0.

Copy link
Copy Markdown
Collaborator Author

@azrael417 azrael417 Jun 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a super exotic edge case but I agree, this should be fixed. Done now.

Comment thread requirements.txt Outdated
azrael417 added 3 commits June 3, 2026 22:11
…unavailability of samples

Signed-off-by: Thorsten Kurth <[email protected]>
Signed-off-by: Thorsten Kurth <[email protected]>
Copy link
Copy Markdown
Collaborator

@romerojosh romerojosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@romerojosh romerojosh merged commit d09e039 into master Jun 4, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants