From a1751b466ae35898295197d7a0c54e830dea3a4b Mon Sep 17 00:00:00 2001
From: xueyuan <xueyuan@tode.com>
Date: Sun, 31 May 2026 10:42:14 +0800
Subject: [PATCH] docs: clarify reward kernel flags

---
 docs/user/reward.rst | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/docs/user/reward.rst b/docs/user/reward.rst
index 684eaccaf..463412df1 100644
--- a/docs/user/reward.rst
+++ b/docs/user/reward.rst
@@ -28,6 +28,13 @@ Customization of the reward
 In grid2op you can customize the reward function / reward kernel used by your agent. By default, when you create an
 environment a reward has been specified for you by the creator of the environment and you have nothing to do:
 
+.. note::
+    In the mathematical MDP notation, the reward kernel is often written as a function of the state,
+    the next state and the action. In grid2op's implementation, reward classes also receive contextual
+    flags such as `has_error`, `is_illegal` and `is_ambiguous`. These flags make it possible to distinguish
+    the original action submitted by the agent from the action effectively applied by the environment, for
+    example when an out-of-bounds redispatching action is replaced by a do-nothing action.
+
 .. code-block:: python
 
     import grid2op