Add curriculum / hierarchical-RL extension on top of v2 by AviralDeshratna11 · Pull Request #223 · PWhiddy/PokemonRedExperiments

AviralDeshratna11 · 2026-06-13T20:28:22Z

Adds a practical curriculum-learning system layered on the existing v2 RedGymEnv without modifying it (baseline_fast_v2.py / red_gym_env_v2.py untouched):

curriculum/ram_map.py: single-source RAM reader (GameState)
curriculum/structured_obs.py: normalized structured observation vector + space
curriculum/milestones.py: 33-milestone curriculum graph + MilestoneManager
curriculum/rewards.py: configurable, logged rewards (one-time/max-based) + anti-loop
curriculum/scripted_helpers.py: deterministic dialogue/heal/shop macros
curriculum/skills.py: option-policy scaffold + high-level controller stub
curriculum/state_store.py: per-milestone success-state snapshots (multiproc-safe)
curriculum/config.py: YAML/JSON config dataclasses
curriculum/tb_callback.py: num-envs-agnostic TensorBoard callback
curriculum_env.py: CurriculumRedGymEnv subclass wiring it together
train_full_game_curriculum.py: staged PPO driver (headless + visual modes)
run_curriculum_interactive.py: watch/evaluate a policy
configs/curriculum.yaml, requirements-curriculum.txt, README_curriculum.md

No ROM is included or distributed; user provides PokemonRed.gb at the repo root.

Adds a practical curriculum-learning system layered on the existing v2 RedGymEnv without modifying it (baseline_fast_v2.py / red_gym_env_v2.py untouched): - curriculum/ram_map.py: single-source RAM reader (GameState) - curriculum/structured_obs.py: normalized structured observation vector + space - curriculum/milestones.py: 33-milestone curriculum graph + MilestoneManager - curriculum/rewards.py: configurable, logged rewards (one-time/max-based) + anti-loop - curriculum/scripted_helpers.py: deterministic dialogue/heal/shop macros - curriculum/skills.py: option-policy scaffold + high-level controller stub - curriculum/state_store.py: per-milestone success-state snapshots (multiproc-safe) - curriculum/config.py: YAML/JSON config dataclasses - curriculum/tb_callback.py: num-envs-agnostic TensorBoard callback - curriculum_env.py: CurriculumRedGymEnv subclass wiring it together - train_full_game_curriculum.py: staged PPO driver (headless + visual modes) - run_curriculum_interactive.py: watch/evaluate a policy - configs/curriculum.yaml, requirements-curriculum.txt, README_curriculum.md No ROM is included or distributed; user provides PokemonRed.gb at the repo root. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

PWhiddy · 2026-06-14T18:25:36Z

does it work?

AviralDeshratna11 · 2026-06-14T19:01:29Z

I ran the agent , as of now it was able to defeat Brock and enter the cave , after that it wasn't able to find the way out so it started to exploit the battle reward, so in order to stop this I implemented MARL and increase exploration over exploitation which has given some progress results, Will update the model, Can I reach out to you if I need your advice? Btw a huge thanks to you for this inspiring work, I want to take you baseline model and make this rl agent clear the entire game.

…

On Sun, 14 Jun, 2026, 23:55 Peter Whidden, ***@***.***> wrote: *PWhiddy* left a comment (PWhiddy/PokemonRedExperiments#223) <#223 (comment)> does it work? — Reply to this email directly, view it on GitHub <#223?email_source=notifications&email_token=ARB74LDRA7WSN5V5PFYIRW3473U3LA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTINZQGI3DINRWHA4KM4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2KYZTPN52GK4S7MNWGSY3L#issuecomment-4702646688>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ARB74LCTNS3NGXPXYZR3F4T473U3LAVCNFSNUABFKJSXA33TNF2G64TZHMZDGMBQGY2DOOJWHNEXG43VMU5TINRVGY4DIMRXGQZ2C5QC> . Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS <https://github.com/notifications/mobile/ios/ARB74LAXBSB2YZTQUVYKZXL473U3LA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTINZQGI3DINRWHA4KM4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2KUZTPN52GK4S7NFXXG> and Android <https://github.com/notifications/mobile/android/ARB74LCYYBOELL4WMRVEZ63473U3LA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTINZQGI3DINRWHA4KM4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2K4ZTPN52GK4S7MFXGI4TPNFSA>. Download it today! You are receiving this because you authored the thread.Message ID: ***@***.***>

PWhiddy · 2026-06-14T19:51:34Z

ofc! If you run v2 with no changes it should eventually get far past that, all the way to ss anne. Feel free to ask questions in the discord!

AviralDeshratna11 · 2026-06-14T21:14:32Z

ohk thanks, the the agent is running, will share the updates when i reach the ss anne and if my pipelines works beyond that

…when stuck to update the weights and rewards and policy

Added LLM decision(local model nemotron mini ) to counter situations …

e6886b0

…when stuck to update the weights and rewards and policy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add curriculum / hierarchical-RL extension on top of v2#223

Add curriculum / hierarchical-RL extension on top of v2#223
AviralDeshratna11 wants to merge 2 commits into
PWhiddy:masterfrom
AviralDeshratna11:curriculum-hierarchical-rl

AviralDeshratna11 commented Jun 13, 2026

Uh oh!

PWhiddy commented Jun 14, 2026

Uh oh!

AviralDeshratna11 commented Jun 14, 2026 via email

Uh oh!

PWhiddy commented Jun 14, 2026

Uh oh!

AviralDeshratna11 commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AviralDeshratna11 commented Jun 13, 2026

Uh oh!

PWhiddy commented Jun 14, 2026

Uh oh!

AviralDeshratna11 commented Jun 14, 2026 via email

Uh oh!

PWhiddy commented Jun 14, 2026

Uh oh!

AviralDeshratna11 commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants