|
| 1 | +0012: Auditability for Authorization Changes |
| 2 | +############################################ |
| 3 | + |
| 4 | +Status |
| 5 | +****** |
| 6 | + |
| 7 | +**Draft** |
| 8 | + |
| 9 | +Context |
| 10 | +******* |
| 11 | + |
| 12 | +The existing architecture (see `ADR 0005`_) introduced ``ExtendedCasbinRule``, which adds |
| 13 | +``created_at``, ``updated_at``, and a ``metadata`` JSON field to the ``CasbinRule`` table. |
| 14 | +This is not an audit trail: there is no actor, no operation type, and no mechanism for |
| 15 | +downstream consumers to react to changes. |
| 16 | + |
| 17 | +As the framework is adopted across more Open edX services, operators and developers need |
| 18 | +answers the current system cannot provide: |
| 19 | + |
| 20 | +- Who assigned this role, and when? |
| 21 | +- Who removed a user's access, and was it intentional? |
| 22 | +- Why was a permission check denied? |
| 23 | + |
| 24 | +A spike (OEPM-Spike: RBAC AuthZ Auditability) examined how peer systems approach this. |
| 25 | +Auditability decomposes into three dimensions: |
| 26 | + |
| 27 | +1. **Attribution**: who changed access? (role assignments, removals) |
| 28 | +2. **Explainability**: why was access granted or denied? (policy evaluation at check time) |
| 29 | +3. **Usage**: who used access? (resource access events, business operations) |
| 30 | + |
| 31 | +SpiceDB and OpenFGA version the entire authorization graph, enabling historical |
| 32 | +reconstruction. Keycloak uses event listeners on administrative actions. openedx-authz sits |
| 33 | +between these: a mutable policy store with no built-in audit layer. |
| 34 | + |
| 35 | +The pycasbin ecosystem has no audit plugin and no mechanism in the |
| 36 | +``casbin-django-orm-adapter`` for change tracking. ``WatcherEx`` provides rule-level hooks |
| 37 | +but carries no actor context and does not cover update operations. |
| 38 | + |
| 39 | +Two transitive dependencies already cover what is needed: |
| 40 | + |
| 41 | +- **django-crum** (``0.7.9``, via ``edx-django-utils``): ``get_current_user()`` from |
| 42 | + thread-local. Returns ``None`` in non-request contexts, treated as a system actor. |
| 43 | +- **django-simple-history** (``3.11.0``, via ``edx-organizations``): model-level change |
| 44 | + tracking with actor, timestamp, and before/after state. Not applied to any openedx-authz |
| 45 | + model yet. |
| 46 | + |
| 47 | +The Auth0 FGA Logging API (October 2025) defines three acceptance criteria for this feature: |
| 48 | + |
| 49 | +- Who made a permission change? (attribution) |
| 50 | +- What did a user access or attempt? (explainability + usage) |
| 51 | +- Can logs be exported to external systems? (SIEM, Aspects) |
| 52 | + |
| 53 | +Decision |
| 54 | +******** |
| 55 | + |
| 56 | +Three independent mechanisms, each answering a different question: |
| 57 | + |
| 58 | +- ``OpenedxPublicSignal``: something happened, react now |
| 59 | +- ``RoleAssignmentAudit``: what happened, in what order, performed by whom |
| 60 | +- ``django-simple-history`` on ``ExtendedCasbinRule``: what was the full state at time T |
| 61 | + (future work) |
| 62 | + |
| 63 | +Attribution: Role Lifecycle Events and Audit Table |
| 64 | +================================================== |
| 65 | + |
| 66 | +Emit an ``OpenedxPublicSignal`` from ``openedx_authz.api.roles`` after every successful role |
| 67 | +assignment or removal, via ``transaction.on_commit``. A Celery handler writes the event to |
| 68 | +``RoleAssignmentAudit``. |
| 69 | + |
| 70 | +The handler is enabled by default. Operators with Aspects or a SIEM can disable it via a |
| 71 | +Django setting to avoid the redundant write. If the handler fails, the Casbin write and the |
| 72 | +event are unaffected. |
| 73 | + |
| 74 | +.. note:: |
| 75 | + |
| 76 | + Whether to write to the audit table in the same process (no Celery) or via a separate |
| 77 | + task is an open question. Needs latency benchmarking before implementation. |
| 78 | + |
| 79 | +Event payload |
| 80 | +------------- |
| 81 | + |
| 82 | +.. code:: python |
| 83 | +
|
| 84 | + { |
| 85 | + "operation": "ASSIGN" | "REMOVE", |
| 86 | + "user": "<namespaced subject key, e.g. user^alice>", |
| 87 | + "role": "<namespaced role key, e.g. role^instructor>", |
| 88 | + "scope": "<namespaced scope key, e.g. course-v1^course-v1:Org+Course+Run>", |
| 89 | + "actor": "<username of the caller, or None for system actor>", |
| 90 | + "timestamp": "<ISO 8601 UTC datetime>", |
| 91 | + } |
| 92 | +
|
| 93 | +The actor is resolved from ``django_crum.get_current_user()`` at API call time. No callers |
| 94 | +need to pass ``actor=`` explicitly. |
| 95 | + |
| 96 | +Audit table |
| 97 | +----------- |
| 98 | + |
| 99 | +``RoleAssignmentAudit`` mirrors the event payload. Registered in Django admin, filterable by |
| 100 | +user, role, scope, actor, and timestamp. |
| 101 | + |
| 102 | +Developer extensibility |
| 103 | +----------------------- |
| 104 | + |
| 105 | +Plugin authors register handlers on the ``OpenedxPublicSignal`` to react to role lifecycle |
| 106 | +events (notifications, cache updates, analytics). Developers without an event bus can consume |
| 107 | +the underlying Django signal directly. If an event bus is configured, events are forwarded to |
| 108 | +Aspects or external systems automatically. |
| 109 | + |
| 110 | +Explainability: Real-Time Decision Context |
| 111 | +========================================== |
| 112 | + |
| 113 | +Expose ``enforce_ex()`` through the public Python API. It returns ``(result, explain_rule)``: |
| 114 | +the boolean decision and the matched policy rule. Callers get the exact rule that allowed or |
| 115 | +denied the request. |
| 116 | + |
| 117 | +Enforcement events are opt-in via ``AUTHZ_ENFORCEMENT_EVENTS_ENABLED``. When enabled, each |
| 118 | +check fires an ``OpenedxPublicSignal`` forwarded to plugin consumers or an event bus. No audit |
| 119 | +table is written: the volume makes per-check storage impractical. |
| 120 | + |
| 121 | +Historical explainability ("why did this user have access last Tuesday?") is deferred. Two |
| 122 | +options are available, both requiring a breaking change to ``is_user_allowed`` to accept |
| 123 | +``as_of``: |
| 124 | + |
| 125 | +- **Option A (event replay):** Replay ``ASSIGN``/``REMOVE`` events from ``RoleAssignmentAudit`` |
| 126 | + up to T. No extra infrastructure; the data is already there once attribution is implemented. |
| 127 | +- **Option B (snapshots):** Add ``HistoricalRecords()`` to ``ExtendedCasbinRule`` and use |
| 128 | + ``as_of(T)`` for the full rule state, including policy definitions. History collection must |
| 129 | + start before the target timestamp. |
| 130 | + |
| 131 | +``authz.policy`` is loaded into the DB and covered by Option B. ``model.conf`` is not |
| 132 | +persisted. A ``model_hash`` field on ``ExtendedCasbinRule`` would let historical queries |
| 133 | +detect whether the model changed. |
| 134 | + |
| 135 | +Consequences |
| 136 | +************ |
| 137 | + |
| 138 | +Attribution |
| 139 | +=========== |
| 140 | + |
| 141 | +- Operators get a filterable role assignment history in Django admin. No external tooling |
| 142 | + required. |
| 143 | +- Developers get a stable ``OpenedxPublicSignal`` extension point. First formally defined |
| 144 | + event in openedx-authz. |
| 145 | +- Events are best-effort: if the audit write fails, the Casbin policy is still durable. |
| 146 | + Consumers requiring guaranteed delivery must implement their own retry logic. |
| 147 | +- ``actor`` is nullable. Non-request contexts (management commands, background tasks) record |
| 148 | + ``None``, logged as a system operation. |
| 149 | +- No new dependencies introduced. |
| 150 | +- Callers of ``openedx_authz.api.roles`` need no signature changes. |
| 151 | + |
| 152 | +Explainability |
| 153 | +============== |
| 154 | + |
| 155 | +- Developers can retrieve the matched policy rule at check time for "why was this denied?" |
| 156 | + debugging. |
| 157 | +- The explanation is point-in-time only. Historical explainability is deferred. |
| 158 | +- Enforcement events are opt-in by design. Enabling them without an external consumer |
| 159 | + produces events that are emitted and discarded. |
| 160 | +- No new dependencies introduced. |
| 161 | + |
| 162 | +Both flows |
| 163 | +========== |
| 164 | + |
| 165 | +- ``RoleAssignmentAudit`` introduces a new migration. No existing table is modified. |
| 166 | +- The ``OpenedxPublicSignal`` schema is a public API surface. Field additions are |
| 167 | + backward-compatible; removals and renames are breaking changes. |
| 168 | +- Usage auditing belongs at the application layer (Open edX tracking events, Aspects), not |
| 169 | + in the authorization library. |
| 170 | +- ``RoleAssignmentAudit`` is not tamper-proof. Compliance-grade immutability is a |
| 171 | + later-phase concern. |
| 172 | + |
| 173 | +Alternatives Considered |
| 174 | +*********************** |
| 175 | + |
| 176 | +``django-simple-history`` on ``ExtendedCasbinRule`` as the attribution audit trail |
| 177 | +=================================================================================== |
| 178 | + |
| 179 | +Rejected for three reasons: |
| 180 | + |
| 181 | +- ``save_policy`` does bulk delete + bulk create and bypasses model signals. Any policy |
| 182 | + reload creates a new snapshot. The ``history_date`` reflects when the table was written, |
| 183 | + not when a role was assigned. Snapshot diffs cannot tell apart "Alice was assigned |
| 184 | + instructor" from "policy reloaded, Alice already had the role." |
| 185 | +- Model signals are not fired for bulk operations, so writes through ``save_policy`` are not |
| 186 | + captured at all. |
| 187 | +- ``ExtendedCasbinRule`` fields (``ptype``, ``v0``--``v5``) are semi-opaque and require an |
| 188 | + interpretation layer. ``RoleAssignmentAudit`` translates at write time. |
| 189 | + |
| 190 | +``django-simple-history`` remains the right tool for Option B (point-in-time state |
| 191 | +reconstruction), where it is a snapshot mechanism, not an operation log. |
| 192 | + |
| 193 | +Use Cases Addressed |
| 194 | +******************* |
| 195 | + |
| 196 | ++------------------------------------------------------------+---------------+ |
| 197 | +| Description | Flow | |
| 198 | ++============================================================+===============+ |
| 199 | +| Operator: who assigned a role to a user, and when? | Attribution | |
| 200 | ++------------------------------------------------------------+---------------+ |
| 201 | +| Operator: who removed a role from a user, and when? | Attribution | |
| 202 | ++------------------------------------------------------------+---------------+ |
| 203 | +| Operator: full role history for a given user | Attribution | |
| 204 | ++------------------------------------------------------------+---------------+ |
| 205 | +| Operator: access control history for a given resource | Attribution | |
| 206 | ++------------------------------------------------------------+---------------+ |
| 207 | +| Developer: hook into role lifecycle events from a plugin | Attribution | |
| 208 | ++------------------------------------------------------------+---------------+ |
| 209 | +| Operator/Developer: query role assignment history via API | Attribution | |
| 210 | ++------------------------------------------------------------+---------------+ |
| 211 | +| Developer: understand why a permission check was denied | Explainability| |
| 212 | ++------------------------------------------------------------+---------------+ |
| 213 | +| Operator/Developer: inspect a user's current permissions | Explainability| |
| 214 | ++------------------------------------------------------------+---------------+ |
| 215 | + |
| 216 | +Deferred: resource access history / usage auditing; export to SIEM / Aspects (available as |
| 217 | +a side effect of the event signal once an event bus is configured, not a first-class |
| 218 | +deliverable of this ADR). |
| 219 | + |
| 220 | +References |
| 221 | +********** |
| 222 | + |
| 223 | +- `ADR 0002`_ |
| 224 | +- `ADR 0004`_ |
| 225 | +- `ADR 0005`_ |
| 226 | +- `Auth0 FGA Logging API`_ |
| 227 | +- `openedx-events documentation`_ |
| 228 | +- `django-simple-history documentation`_ |
| 229 | +- `django-crum documentation`_ |
| 230 | +- OEPM-Spike: RBAC AuthZ Auditability |
| 231 | + |
| 232 | +.. _ADR 0002: https://github.com/openedx/openedx-authz/blob/main/docs/decisions/0002-authorization-model-foundation.rst |
| 233 | +.. _ADR 0004: https://github.com/openedx/openedx-authz/blob/main/docs/decisions/0004-technology-selection.rst |
| 234 | +.. _ADR 0005: https://github.com/openedx/openedx-authz/blob/main/docs/decisions/0005-architecture-and-data-modeling.rst |
| 235 | +.. _Auth0 FGA Logging API: https://auth0.com/blog/auth0-fga-logging-api-a-complete-audit-trail-for-authorization/ |
| 236 | +.. _openedx-events documentation: https://docs.openedx.org/projects/openedx-events/en/latest/ |
| 237 | +.. _django-simple-history documentation: https://django-simple-history.readthedocs.io/ |
| 238 | +.. _django-crum documentation: https://pypi.org/project/django-crum/ |
0 commit comments