From a1751b466ae35898295197d7a0c54e830dea3a4b Mon Sep 17 00:00:00 2001 From: xueyuan Date: Sun, 31 May 2026 10:42:14 +0800 Subject: [PATCH] docs: clarify reward kernel flags --- docs/user/reward.rst | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/docs/user/reward.rst b/docs/user/reward.rst index 684eaccaf..463412df1 100644 --- a/docs/user/reward.rst +++ b/docs/user/reward.rst @@ -28,6 +28,13 @@ Customization of the reward In grid2op you can customize the reward function / reward kernel used by your agent. By default, when you create an environment a reward has been specified for you by the creator of the environment and you have nothing to do: +.. note:: + In the mathematical MDP notation, the reward kernel is often written as a function of the state, + the next state and the action. In grid2op's implementation, reward classes also receive contextual + flags such as `has_error`, `is_illegal` and `is_ambiguous`. These flags make it possible to distinguish + the original action submitted by the agent from the action effectively applied by the environment, for + example when an out-of-bounds redispatching action is replaced by a do-nothing action. + .. code-block:: python import grid2op