Draft - 2026-04-13
The system is transitioning from the legacy permissions model (CourseAccessRole)
to the new openedx-authz system.
Currently, migrations between the two systems are performed manually using Django management commands:
authz_migrate_course_authoring(forward migration)authz_rollback_course_authoring(rollback migration)
In ADR 0010 and ADR 0011 it was established that migrations must occur automatically when
the feature flag authz.enable_course_authoring changes state, but the definition of
the specific mechanism was deferred. This ADR addresses that gap.
The current manual approach has the following problems:
- Access disparity: Many users have access to Django Admin and can toggle the flag, while significantly fewer have permission to run management commands. This creates an operational gap where the flag state can change independently of the migration process. As a result, coordination is required between different roles (those managing flags vs. those executing migrations), increasing the risk of delays, misalignment, and inconsistent system state.
- Outage window: When a flag change and the corresponding migration command are not executed atomically, there is a period where the flag points to one system but the permission data still lives in the other. Any permission check made during this window will fail, causing real outages for affected courses or organizations.
- No user feedback: Users have no way to know the result of a migration without inspecting logs manually.
- No concurrency protection: Nothing prevents operators from running the migration command multiple times simultaneously, which can lead to race conditions and data corruption.
We will implement an automatic and synchronous migration mechanism triggered by changes in the
authz.enable_course_authoring feature flag. The solution consists of:
- A
post_savesignal handler that detects flag changes and executes the migration. - A tracking model to record migration status and errors.
- A database-level constraint to prevent concurrent migrations on the same scope.
Note
Scope Constraint
Automatic migration will only trigger for course-level and organization-level flag overrides, not for global (instance-wide) Waffle flag changes. The reason is that a global flag change could affect a large number of courses simultaneously, introducing an unacceptable performance risk. Global flag changes must be handled via management commands by operators who explicitly accept the performance implications.
A concern was raised about the risks of triggering data migrations on a live instance. Data migrations are typically executed under controlled conditions (e.g., during maintenance windows) because any failure can leave the system in an invalid state. Triggering them automatically via a feature flag toggle introduces additional risk:
- Django Admin access is sometimes granted to instructors or non-technical staff who may not understand the implications of toggling the flag.
- A live instance may be processing requests concurrently, increasing the chance of partial failures or inconsistent transient states.
To address this, the automatic migration mechanism will be guarded by a Django setting:
ENABLE_AUTOMATIC_AUTHZ_COURSE_AUTHORING_MIGRATION = FalseThis setting:
- Is disabled by default.
- Must be explicitly set to
Trueby a site operator who understands the migration risks. - Acts as a prerequisite check inside the signal handler: if it is not enabled, the signal detects the flag change but does not execute the migration. The operator must then run the migration manually using the existing management commands.
A post_save handler is attached to WaffleFlagCourseOverrideModel and
WaffleFlagOrgOverrideModel for the authz.enable_course_authoring flag.
The handler fires after the record is committed to the database, so the new flag value is the authoritative and durable state of the system when the migration begins.
Both WaffleFlagCourseOverrideModel and WaffleFlagOrgOverrideModel extend
ConfigurationModel, which creates a new row on every save instead of updating the
existing record. This means the full change history for each scope is preserved in the
table. The previous override value is therefore always available as the most recent record
for the same scope that is not the one just saved.
If no previous record exists for the scope (this is the first override ever created for
it), the migration runs unconditionally based on the current enabled value, without
comparing against a previous state.
The post_save handler:
- Queries the same flag override model for the previous record as described above.
- If no previous record exists, runs the migration based on the current
enabledvalue without further comparison. - If a previous record exists, compares its
enabledvalue with the saved one to determine whether an effective transition occurred:False → True: triggers a forward migration (Legacy → openedx-authz)True → False: triggers a rollback migration (openedx-authz → Legacy)- No change: the handler does nothing. No tracking record is created and no migration runs.
- Determines the scope (course or organization) from the model being saved.
- Calls the utility function synchronously with the migration parameters.
A new model is introduced to track the lifecycle of each migration operation:
class AuthzCourseAuthoringMigrationRun(models.Model):
migration_type = models.CharField(max_length=20) # forward / rollback
scope_type = models.CharField(max_length=20) # course / org
scope_key = models.CharField(max_length=255)
status = models.CharField(max_length=20) # running, completed, partial_success, failed, skipped
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
completed_at = models.DateTimeField(null=True, blank=True)
metadata = models.JSONField(default=dict)This model is registered in Django Admin so users can inspect migration history and diagnose failures without needing to access logs directly.
A higher-level orchestration layer (separate from the existing utility functions) will be
responsible for creating and updating AuthzCourseAuthoringMigrationRun records. This
layer wraps the core migration logic, ensuring that lifecycle tracking (opening a
running record, handling errors, and writing the final status) is applied consistently
regardless of whether the migration is triggered by the signal handler or a management
command.
The status field reflects the precise outcome of each run. The possible values are:
running: the migration is actively executing.completed: all records were migrated successfully.partial_success: the migration process ran to completion, but one or more individual records failed and were skipped. Themetadatafield contains details about the failures.failed: a critical error prevented the migration from completing (e.g., an unhandled exception or infrastructure problem). Themetadatafield contains the exception details.skipped: the migration was not attempted because another run for the same scope was already active.
To prevent overlapping migrations on the same scope, the tracking model enforces a
conditional UniqueConstraint on (scope_type, scope_key) filtered to
status="running". This guarantees that no second active migration record can be
inserted for the same scope regardless of how many processes attempt to do so concurrently.
Any attempt raises an IntegrityError, which the caller handles by recording a
skipped run and aborting.
class Meta:
constraints = [
models.UniqueConstraint(
fields=["scope_type", "scope_key"],
condition=models.Q(status="running"),
name="unique_active_migration_per_scope",
)
]- The user changes the
authz.enable_course_authoringflag for a course or organization and saves the record. A new row is created in the override table. - The
post_savehandler queries the same override model for the previous record (most recent row for the same scope, excluding the one just saved) to obtain the previousenabledvalue. - The handler compares the previous value with the current
enabledvalue. If no effective change occurred, it does nothing. - If a transition is detected, the handler calls the utility function synchronously. The
function creates an
AuthzCourseAuthoringMigrationRunrecord withstatus="running"(the database constraint prevents this if another run for the same scope is already active) and executes the migration. - The record is updated to its final status (
completed,partial_success,failed, orskipped) before thepost_savehandler returns. - The user can review the migration outcome via Django Admin on the
AuthzCourseAuthoringMigrationRunmodel.
- Full observability: every migration run is recorded with its status, scope, and metadata in the tracking model.
- Concurrency-safe: the database-level constraint prevents overlapping migrations on the same scope, regardless of cache availability or worker failures.
- No manual intervention required for course-level or organization-level flag changes. Operators or users who have opted in do not need to remember to run management commands.
- Safe by default: the opt-in guard flag ensures that automatic migration is never triggered unexpectedly on instances where operators have not explicitly accepted the risks.
- Global flag changes are not covered: operators must still run management commands manually when enabling or disabling the flag at the instance level. This is a deliberate trade-off to avoid performance risks.
- Blocks the request: the migration runs synchronously inside the
post_savesignal, so the HTTP request that triggered the flag change does not return until the migration finishes. For large organization-level scopes this can cause noticeable latency or timeouts. This is an accepted trade-off given that automatic migration is scoped to course-level and organization-level overrides only (never global), and is opt-in. - Runtime execution trade-offs: Unlike management commands typically executed during maintenance windows, this migration runs in a live production environment as part of normal system operation. This means it executes under concurrent load, with active requests and database activity, which introduces variability in execution conditions. This trade-off is inherent to enabling the feature flag to act as a real-time source of truth. The design prioritizes consistency between flag state and permission data over strictly controlled execution environments, while providing observability and recovery mechanisms to mitigate operational risk.
- Using pre_save to trigger the migration
- The use of pre_save signals was discarded because they depend on a state transition that has not yet been committed to the database. Operating before persistence assumes a future-valid state that may not materialize. post_save was preferred to ensure migration logic operates only on confirmed states.
- Asynchronous execution via Celery
- Given that automatic migration is scoped to course-level and organization-level overrides where migration volumes are bounded, synchronous execution is simpler and provides stronger consistency guarantees.
- Manual migration
- Error-prone, not scalable, and inconsistent. The flag is the source of truth, but manual migration allows the system to end up in inconsistent states (e.g., flag enabled but data still in the legacy system), resulting in an operationally fragile design.
- Automatic global migration
- Triggering automatic migration when the flag is changed globally (instance-wide) would risk performance degradation on large instances. This was explicitly ruled out: global migrations must remain operator-initiated via management commands.