Skip to content

Add curriculum / hierarchical-RL extension on top of v2#223

Open
AviralDeshratna11 wants to merge 2 commits into
PWhiddy:masterfrom
AviralDeshratna11:curriculum-hierarchical-rl
Open

Add curriculum / hierarchical-RL extension on top of v2#223
AviralDeshratna11 wants to merge 2 commits into
PWhiddy:masterfrom
AviralDeshratna11:curriculum-hierarchical-rl

Conversation

@AviralDeshratna11

Copy link
Copy Markdown

Adds a practical curriculum-learning system layered on the existing v2 RedGymEnv without modifying it (baseline_fast_v2.py / red_gym_env_v2.py untouched):

  • curriculum/ram_map.py: single-source RAM reader (GameState)
  • curriculum/structured_obs.py: normalized structured observation vector + space
  • curriculum/milestones.py: 33-milestone curriculum graph + MilestoneManager
  • curriculum/rewards.py: configurable, logged rewards (one-time/max-based) + anti-loop
  • curriculum/scripted_helpers.py: deterministic dialogue/heal/shop macros
  • curriculum/skills.py: option-policy scaffold + high-level controller stub
  • curriculum/state_store.py: per-milestone success-state snapshots (multiproc-safe)
  • curriculum/config.py: YAML/JSON config dataclasses
  • curriculum/tb_callback.py: num-envs-agnostic TensorBoard callback
  • curriculum_env.py: CurriculumRedGymEnv subclass wiring it together
  • train_full_game_curriculum.py: staged PPO driver (headless + visual modes)
  • run_curriculum_interactive.py: watch/evaluate a policy
  • configs/curriculum.yaml, requirements-curriculum.txt, README_curriculum.md

No ROM is included or distributed; user provides PokemonRed.gb at the repo root.

Adds a practical curriculum-learning system layered on the existing v2 RedGymEnv
without modifying it (baseline_fast_v2.py / red_gym_env_v2.py untouched):

- curriculum/ram_map.py: single-source RAM reader (GameState)
- curriculum/structured_obs.py: normalized structured observation vector + space
- curriculum/milestones.py: 33-milestone curriculum graph + MilestoneManager
- curriculum/rewards.py: configurable, logged rewards (one-time/max-based) + anti-loop
- curriculum/scripted_helpers.py: deterministic dialogue/heal/shop macros
- curriculum/skills.py: option-policy scaffold + high-level controller stub
- curriculum/state_store.py: per-milestone success-state snapshots (multiproc-safe)
- curriculum/config.py: YAML/JSON config dataclasses
- curriculum/tb_callback.py: num-envs-agnostic TensorBoard callback
- curriculum_env.py: CurriculumRedGymEnv subclass wiring it together
- train_full_game_curriculum.py: staged PPO driver (headless + visual modes)
- run_curriculum_interactive.py: watch/evaluate a policy
- configs/curriculum.yaml, requirements-curriculum.txt, README_curriculum.md

No ROM is included or distributed; user provides PokemonRed.gb at the repo root.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@PWhiddy

PWhiddy commented Jun 14, 2026

Copy link
Copy Markdown
Owner

does it work?

@AviralDeshratna11

AviralDeshratna11 commented Jun 14, 2026 via email

Copy link
Copy Markdown
Author

@PWhiddy

PWhiddy commented Jun 14, 2026

Copy link
Copy Markdown
Owner

ofc! If you run v2 with no changes it should eventually get far past that, all the way to ss anne. Feel free to ask questions in the discord!

@AviralDeshratna11

Copy link
Copy Markdown
Author

ohk thanks, the the agent is running, will share the updates when i reach the ss anne and if my pipelines works beyond that

…when stuck to update the weights and rewards and policy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants