From 818a44409e391e10c26d2b4811308dd6534f051f Mon Sep 17 00:00:00 2001
From: Bryann Valderrama <bryann.valderrama@edunext.co>
Date: Thu, 9 Apr 2026 13:26:35 -0500
Subject: [PATCH 1/6] docs: add course authoring automatic migration adr

---
 ...3-course-authoring-automatic-migration.rst | 233 ++++++++++++++++++
 1 file changed, 233 insertions(+)
 create mode 100644 docs/decisions/0013-course-authoring-automatic-migration.rst

diff --git a/docs/decisions/0013-course-authoring-automatic-migration.rst b/docs/decisions/0013-course-authoring-automatic-migration.rst
new file mode 100644
index 00000000..ed020ef9
--- /dev/null
+++ b/docs/decisions/0013-course-authoring-automatic-migration.rst
@@ -0,0 +1,233 @@
+0013: Course Authoring - Automatic Migration Triggered by Course Authoring Flag
+###############################################################################
+
+Status
+******
+
+**Draft** - *2026-04-09*
+
+Context
+*******
+
+The system is transitioning from the legacy permissions model (``CourseAccessRole``)
+to the new openedx-authz system.
+
+Currently, migrations between both systems are performed manually using Django management commands:
+
+- ``authz_migrate_course_authoring`` (forward migration)
+- ``authz_rollback_course_authoring`` (rollback migration)
+
+In `ADR 0011`_ and `ADR 0010`_ it was established that migration must occur automatically when
+the feature flag ``authz.enable_course_authoring`` changes state, but they deferred the definition of
+the specific mechanism. This ADR addresses that gap.
+
+The current manual approach presents the following risks:
+
+- **Inconsistency**: If an operator enables or disables the flag without running the migration
+  command, the permission data in both systems will diverge.
+- **No status tracking**: There is no visibility into whether a migration is in progress,
+  completed, or failed.
+- **No concurrency protection**: Multiple concurrent flag changes can trigger overlapping
+  migrations, leading to race conditions and data corruption.
+- **No user feedback**: Operators have no way to know the result of a migration without
+  inspecting logs manually.
+
+Decision
+********
+
+We will implement an automatic and asynchronous migration mechanism triggered by changes in the
+``authz.enable_course_authoring`` feature flag. The solution consists of:
+
+#. Django signal handler to detect flag state changes.
+#. Celery tasks to execute migrations asynchronously.
+#. A tracking model to record migration status and errors.
+#. A locking mechanism to prevent concurrent migrations on the same scope.
+
+.. note::
+
+  **Scope Constraint**
+
+  Automatic migration will only trigger for **course-level** and **organization-level** flag
+  overrides, not for global (instance-wide) Waffle flag changes. The reason is that a global
+  flag change could affect a large number of courses simultaneously, introducing an unacceptable
+  performance risk. Global flag changes must be handled via management commands by operators
+  who explicitly accept the performance implications.
+
+Operator Safety and Opt-in Design
+==================================
+
+A concern was raised about the risks of triggering data migrations on a live instance. Data
+migrations are typically executed under controlled conditions (e.g., during maintenance windows)
+because any failure can leave the system in an invalid state. Triggering them automatically via
+a feature flag toggle introduces additional risk:
+
+- Django Admin access is sometimes granted to instructors or non-technical staff who may not
+  understand the implications of toggling the flag.
+- A live instance may be processing requests concurrently, increasing the chance of partial
+  failures or inconsistent transient states.
+
+To address this, the automatic migration mechanism will be **guarded by a Django setting**:
+
+.. code:: python
+
+    ENABLE_AUTOMATIC_COURSE_AUTHORING_MIGRATION = False
+
+This setting:
+
+- Is **disabled by default**.
+- Must be explicitly set to ``True`` by a site operator who understands the migration risks.
+- Acts as a prerequisite check inside the signal handler: if it is not enabled, the signal
+  detects the flag change but does **not** dispatch the Celery task. The operator must then
+  run the migration manually using the existing management commands.
+
+This design preserves the automated behavior for operators who opt in while keeping the system
+safe for deployments where uncontrolled migrations are unacceptable.
+
+Detailed Design
+===============
+
+1. Utility Function Updates
+---------------------------
+
+The existing utility functions ``migrate_legacy_course_roles_to_authz`` and
+``migrate_authz_to_legacy_course_roles`` will be modified to incorporate the locking strategy
+(see **Concurrency Control** below) and the tracking logic (see **Migration Tracking Model** below)
+as integral steps of their execution.
+
+This approach ensures that both the Celery task and the management commands go through the same
+tracking and locking path.
+
+2. Migration Trigger (Django Signals)
+-------------------------------------
+
+``pre_save`` signal handlers are attached to ``WaffleFlagCourseOverrideModel`` and
+``WaffleFlagOrgOverrideModel``. When a save is detected for the ``authz.enable_course_authoring``
+flag, the handler:
+
+#. Compares the previous and new flag state to determine the transition direction:
+
+   - ``False → True``: triggers a **forward migration** (Legacy → openedx-authz)
+   - ``True → False``: triggers a **rollback migration** (openedx-authz → Legacy)
+
+#. Determines the scope (course or organization) from the model being saved.
+#. Dispatches an asynchronous Celery task with the migration parameters.
+
+.. note::
+  If no effective change is detected (i.e., the flag state is the same as the previous state),
+  the signal handler does nothing.
+
+3. Migration Tracking Model
+---------------------------
+
+A new model is introduced to track the lifecycle of each migration operation:
+
+.. code:: python
+
+    class CourseAuthoringMigrationRun(models.Model):
+        migration_type = models.CharField(max_length=20)  # forward / rollback
+        scope_type = models.CharField(max_length=20)  # course / org
+        scope_key = models.CharField(max_length=255)
+        status = models.CharField(max_length=20)  # pending, running, completed, skipped
+        created_at = models.DateTimeField(auto_now_add=True)
+        updated_at = models.DateTimeField(auto_now=True)
+        completed_at = models.DateTimeField(null=True, blank=True)
+        metadata = models.JSONField(default=dict)
+
+This model is registered in Django Admin so operators can inspect migration history and
+diagnose failures without needing to access logs directly.
+
+4. Asynchronous Execution
+-------------------------
+
+The Celery task acts strictly as a thin dispatcher. All core logic, including locking,
+tracking, and migration execution, is implemented in the utility functions (see
+**Utility Function Updates** above).
+
+All database operations within the migration itself execute inside an atomic transaction.
+If the migration fails, no data is deleted from either system, preserving consistency.
+
+5. Concurrency Control (Locking Strategy)
+-----------------------------------------
+
+To prevent race conditions caused by rapid or concurrent flag changes on the same scope, a
+distributed lock is implemented using the Django cache backend (Redis):
+
+.. code:: python
+
+    lock_key = f"authz_migration:{scope_type}:{scope_key}"
+
+The lock is acquired using ``cache.add()``, which is an atomic operation. The default TTL
+is **1 hour**. If a lock already exists for the given scope, the migration is skipped
+and a new tracking record is created with that status. This ensures that only one
+migration runs at a time for the same scope.
+
+6. Execution Flow
+------------------
+
+1. An operator changes the ``authz.enable_course_authoring`` flag for a course or
+   organization via Django Admin or a management command.
+2. The ``pre_save`` signal handler detects the state transition.
+3. A Celery task is dispatched asynchronously.
+4. The task calls the utility function, which acquires the lock, creates and updates the
+   ``CourseAuthoringMigrationRun`` record, and executes the migration.
+5. The operator can check the migration status via Django Admin on the ``CourseAuthoringMigrationRun``
+   model.
+
+Consequences
+************
+
+Positive consequences
+=====================
+
+- **Migration is decoupled from the request cycle**: the flag change returns immediately and
+  migration happens in the background.
+- **Full observability**: every migration run is recorded with its status, scope, and metadata
+  in the tracking model.
+- **Concurrency-safe**: the lock strategy prevents overlapping migrations on the same scope.
+- **No manual intervention required**: for course-level or organization-level flag changes. Operators
+  who have opted in do not need to remember to run management commands.
+- **Safe by default**: the opt-in guard flag ensures that automatic migration is never triggered
+  unexpectedly on instances where operators have not explicitly accepted the risks.
+
+Negative consequences / risks
+==============================
+
+- **Global flag changes are not covered**: operators must still run management commands
+  manually when enabling or disabling the flag at the instance level. This is a deliberate
+  trade-off to avoid performance risks.
+- **Celery dependency**: the system now requires a functioning Celery worker for automatic
+  migration. If workers are down, migrations will be queued but not executed until workers
+  recover.
+- **Lock TTL edge cases**: if a migration takes longer than 1 hour (unlikely but possible
+  for very large organizations), the lock will expire and a new migration for the same scope
+  could start concurrently for the same scope.
+
+Rejected Alternatives
+*********************
+
+**Synchronous execution in the signal handler**
+  Executing the migration directly inside the ``pre_save`` signal would block the HTTP
+  request that triggered the flag change, leading to timeouts for large scopes and poor
+  operator experience.
+
+**Manual migration**
+  Error-prone, not scalable, and inconsistent. The flag is the source of truth, but manual
+  migration allows the system to end up in inconsistent states (e.g., flag enabled but data
+  still in the legacy system), resulting in an operationally fragile design.
+
+**Automatic global migration**
+  Triggering automatic migration when the flag is changed globally (instance-wide) would
+  risk performance degradation on large instances. This was explicitly ruled out: global
+  migrations must remain operator-initiated via management commands.
+
+References
+**********
+
+* `Automatic Migration Spike`_
+* `ADR 0010`_
+* `ADR 0011`_
+
+.. _Automatic Migration Spike:
+   https://openedx.atlassian.net/wiki/spaces/OEPM/pages/6205112321/Spike+-+RBAC+AuthZ+-+Automatic+Role+Migration
+.. _ADR 0010: 0010-course-authoring-flag.rst
+.. _ADR 0011: 0011-course-authoring-migration-process.rst

From 944808e7ea100c03859f67e1645297d702f80f50 Mon Sep 17 00:00:00 2001
From: Bryann Valderrama <bryann.valderrama@edunext.co>
Date: Fri, 10 Apr 2026 09:52:59 -0500
Subject: [PATCH 2/6] chore: address pr review

- Rename Django setting
- Rename tracking model
---
 .../0013-course-authoring-automatic-migration.rst    | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/docs/decisions/0013-course-authoring-automatic-migration.rst b/docs/decisions/0013-course-authoring-automatic-migration.rst
index ed020ef9..38764af0 100644
--- a/docs/decisions/0013-course-authoring-automatic-migration.rst
+++ b/docs/decisions/0013-course-authoring-automatic-migration.rst
@@ -18,8 +18,8 @@ Currently, migrations between both systems are performed manually using Django m
 - ``authz_rollback_course_authoring`` (rollback migration)
 
 In `ADR 0011`_ and `ADR 0010`_ it was established that migration must occur automatically when
-the feature flag ``authz.enable_course_authoring`` changes state, but they deferred the definition of
-the specific mechanism. This ADR addresses that gap.
+the feature flag ``authz.enable_course_authoring`` changes state, but the definition of
+the specific mechanism was deferred. This ADR addresses that gap.
 
 The current manual approach presents the following risks:
 
@@ -70,7 +70,7 @@ To address this, the automatic migration mechanism will be **guarded by a Django
 
 .. code:: python
 
-    ENABLE_AUTOMATIC_COURSE_AUTHORING_MIGRATION = False
+    ENABLE_AUTOMATIC_AUTHZ_COURSE_AUTHORING_MIGRATION = False
 
 This setting:
 
@@ -123,7 +123,7 @@ A new model is introduced to track the lifecycle of each migration operation:
 
 .. code:: python
 
-    class CourseAuthoringMigrationRun(models.Model):
+    class AuthzCourseAuthoringMigrationRun(models.Model):
         migration_type = models.CharField(max_length=20)  # forward / rollback
         scope_type = models.CharField(max_length=20)  # course / org
         scope_key = models.CharField(max_length=255)
@@ -169,8 +169,8 @@ migration runs at a time for the same scope.
 2. The ``pre_save`` signal handler detects the state transition.
 3. A Celery task is dispatched asynchronously.
 4. The task calls the utility function, which acquires the lock, creates and updates the
-   ``CourseAuthoringMigrationRun`` record, and executes the migration.
-5. The operator can check the migration status via Django Admin on the ``CourseAuthoringMigrationRun``
+   ``AuthzCourseAuthoringMigrationRun`` record, and executes the migration.
+5. The operator can check the migration status via Django Admin on the ``AuthzCourseAuthoringMigrationRun``
    model.
 
 Consequences

From eea181e1dc701dda5cbfbdde068e4ea02160ddd0 Mon Sep 17 00:00:00 2001
From: Bryann Valderrama <bryann.valderrama@edunext.co>
Date: Fri, 10 Apr 2026 11:52:16 -0500
Subject: [PATCH 3/6] docs: clarify concurrency protection risks

---
 docs/decisions/0013-course-authoring-automatic-migration.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/decisions/0013-course-authoring-automatic-migration.rst b/docs/decisions/0013-course-authoring-automatic-migration.rst
index 38764af0..fab6bd9b 100644
--- a/docs/decisions/0013-course-authoring-automatic-migration.rst
+++ b/docs/decisions/0013-course-authoring-automatic-migration.rst
@@ -27,8 +27,8 @@ The current manual approach presents the following risks:
   command, the permission data in both systems will diverge.
 - **No status tracking**: There is no visibility into whether a migration is in progress,
   completed, or failed.
-- **No concurrency protection**: Multiple concurrent flag changes can trigger overlapping
-  migrations, leading to race conditions and data corruption.
+- **No concurrency protection**: Nothing prevents operators from running the migration command
+  multiple times simultaneously, which can lead to race conditions and data corruption.
 - **No user feedback**: Operators have no way to know the result of a migration without
   inspecting logs manually.
 

From c55800c38e973eb44b10cdc96888b84900d44265 Mon Sep 17 00:00:00 2001
From: Bryann Valderrama <bryann.valderrama@edunext.co>
Date: Mon, 13 Apr 2026 16:05:11 -0500
Subject: [PATCH 4/6] chore: address PR reviews

---
 ...3-course-authoring-automatic-migration.rst | 233 +++++++++++-------
 1 file changed, 142 insertions(+), 91 deletions(-)

diff --git a/docs/decisions/0013-course-authoring-automatic-migration.rst b/docs/decisions/0013-course-authoring-automatic-migration.rst
index fab6bd9b..e97486a6 100644
--- a/docs/decisions/0013-course-authoring-automatic-migration.rst
+++ b/docs/decisions/0013-course-authoring-automatic-migration.rst
@@ -4,7 +4,7 @@
 Status
 ******
 
-**Draft** - *2026-04-09*
+**Draft** - *2026-04-13*
 
 Context
 *******
@@ -12,36 +12,41 @@ Context
 The system is transitioning from the legacy permissions model (``CourseAccessRole``)
 to the new openedx-authz system.
 
-Currently, migrations between both systems are performed manually using Django management commands:
+Currently, migrations between the two systems are performed manually using Django
+management commands:
 
 - ``authz_migrate_course_authoring`` (forward migration)
 - ``authz_rollback_course_authoring`` (rollback migration)
 
-In `ADR 0011`_ and `ADR 0010`_ it was established that migration must occur automatically when
+In `ADR 0010`_ and `ADR 0011`_ it was established that migrations must occur automatically when
 the feature flag ``authz.enable_course_authoring`` changes state, but the definition of
 the specific mechanism was deferred. This ADR addresses that gap.
 
-The current manual approach presents the following risks:
-
-- **Inconsistency**: If an operator enables or disables the flag without running the migration
-  command, the permission data in both systems will diverge.
-- **No status tracking**: There is no visibility into whether a migration is in progress,
-  completed, or failed.
+The current manual approach has the following problems:
+
+- **Access disparity**: Many users have access to Django Admin and can toggle the flag, while
+  significantly fewer have permission to run management commands. This creates an operational
+  gap where the flag state can change independently of the migration process. As a result,
+  coordination is required between different roles (those managing flags vs. those executing
+  migrations), increasing the risk of delays, misalignment, and inconsistent system state.
+- **Outage window**: When a flag change and the corresponding migration command are not executed
+  atomically, there is a period where the flag points to one system but the permission data
+  still lives in the other. Any permission check made during this window will fail, causing
+  real outages for affected courses or organizations.
+- **No user feedback**: Users have no way to know the result of a migration without
+  inspecting logs manually.
 - **No concurrency protection**: Nothing prevents operators from running the migration command
   multiple times simultaneously, which can lead to race conditions and data corruption.
-- **No user feedback**: Operators have no way to know the result of a migration without
-  inspecting logs manually.
 
 Decision
 ********
 
-We will implement an automatic and asynchronous migration mechanism triggered by changes in the
+We will implement an automatic and synchronous migration mechanism triggered by changes in the
 ``authz.enable_course_authoring`` feature flag. The solution consists of:
 
-#. Django signal handler to detect flag state changes.
-#. Celery tasks to execute migrations asynchronously.
+#. A ``post_save`` signal handler that detects flag changes and executes the migration.
 #. A tracking model to record migration status and errors.
-#. A locking mechanism to prevent concurrent migrations on the same scope.
+#. A database-level constraint to prevent concurrent migrations on the same scope.
 
 .. note::
 
@@ -54,7 +59,7 @@ We will implement an automatic and asynchronous migration mechanism triggered by
   who explicitly accept the performance implications.
 
 Operator Safety and Opt-in Design
-==================================
+=================================
 
 A concern was raised about the risks of triggering data migrations on a live instance. Data
 migrations are typically executed under controlled conditions (e.g., during maintenance windows)
@@ -77,46 +82,53 @@ This setting:
 - Is **disabled by default**.
 - Must be explicitly set to ``True`` by a site operator who understands the migration risks.
 - Acts as a prerequisite check inside the signal handler: if it is not enabled, the signal
-  detects the flag change but does **not** dispatch the Celery task. The operator must then
+  detects the flag change but does not execute the migration. The operator must then
   run the migration manually using the existing management commands.
 
-This design preserves the automated behavior for operators who opt in while keeping the system
-safe for deployments where uncontrolled migrations are unacceptable.
-
 Detailed Design
 ===============
 
-1. Utility Function Updates
----------------------------
+1. Migration Trigger (Django Signals)
+-------------------------------------
 
-The existing utility functions ``migrate_legacy_course_roles_to_authz`` and
-``migrate_authz_to_legacy_course_roles`` will be modified to incorporate the locking strategy
-(see **Concurrency Control** below) and the tracking logic (see **Migration Tracking Model** below)
-as integral steps of their execution.
+A ``post_save`` handler is attached to ``WaffleFlagCourseOverrideModel`` and
+``WaffleFlagOrgOverrideModel`` for the ``authz.enable_course_authoring`` flag.
 
-This approach ensures that both the Celery task and the management commands go through the same
-tracking and locking path.
+The handler fires after the record is committed to the database, so the new flag value is
+the authoritative and durable state of the system when the migration begins.
 
-2. Migration Trigger (Django Signals)
--------------------------------------
+Retrieving the previous state from the same model
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Both ``WaffleFlagCourseOverrideModel`` and ``WaffleFlagOrgOverrideModel`` extend
+``ConfigurationModel``, which **creates a new row on every save** instead of updating the
+existing record. This means the full change history for each scope is preserved in the
+table. The previous override value is therefore always available as the most recent record
+for the same scope that is not the one just saved.
+
+If no previous record exists for the scope (this is the first override ever created for
+it), the migration runs unconditionally based on the current ``enabled`` value, without
+comparing against a previous state.
+
+``post_save`` execution
+~~~~~~~~~~~~~~~~~~~~~~~
 
-``pre_save`` signal handlers are attached to ``WaffleFlagCourseOverrideModel`` and
-``WaffleFlagOrgOverrideModel``. When a save is detected for the ``authz.enable_course_authoring``
-flag, the handler:
+The ``post_save`` handler:
 
-#. Compares the previous and new flag state to determine the transition direction:
+#. Queries the same flag override model for the previous record as described above.
+#. If no previous record exists, runs the migration based on the current ``enabled`` value
+   without further comparison.
+#. If a previous record exists, compares its ``enabled`` value with the saved one to
+   determine whether an effective transition occurred:
 
    - ``False → True``: triggers a **forward migration** (Legacy → openedx-authz)
    - ``True → False``: triggers a **rollback migration** (openedx-authz → Legacy)
+   - No change: the handler does nothing. No tracking record is created and no migration runs.
 
 #. Determines the scope (course or organization) from the model being saved.
-#. Dispatches an asynchronous Celery task with the migration parameters.
+#. Calls the utility function synchronously with the migration parameters.
 
-.. note::
-  If no effective change is detected (i.e., the flag state is the same as the previous state),
-  the signal handler does nothing.
-
-3. Migration Tracking Model
+2. Migration Tracking Model
 ---------------------------
 
 A new model is introduced to track the lifecycle of each migration operation:
@@ -127,51 +139,76 @@ A new model is introduced to track the lifecycle of each migration operation:
         migration_type = models.CharField(max_length=20)  # forward / rollback
         scope_type = models.CharField(max_length=20)  # course / org
         scope_key = models.CharField(max_length=255)
-        status = models.CharField(max_length=20)  # pending, running, completed, skipped
+        status = models.CharField(max_length=20)  # running, completed, partial_success, failed, skipped
         created_at = models.DateTimeField(auto_now_add=True)
         updated_at = models.DateTimeField(auto_now=True)
         completed_at = models.DateTimeField(null=True, blank=True)
         metadata = models.JSONField(default=dict)
 
-This model is registered in Django Admin so operators can inspect migration history and
+This model is registered in Django Admin so users can inspect migration history and
 diagnose failures without needing to access logs directly.
 
-4. Asynchronous Execution
--------------------------
-
-The Celery task acts strictly as a thin dispatcher. All core logic, including locking,
-tracking, and migration execution, is implemented in the utility functions (see
-**Utility Function Updates** above).
-
-All database operations within the migration itself execute inside an atomic transaction.
-If the migration fails, no data is deleted from either system, preserving consistency.
-
-5. Concurrency Control (Locking Strategy)
------------------------------------------
-
-To prevent race conditions caused by rapid or concurrent flag changes on the same scope, a
-distributed lock is implemented using the Django cache backend (Redis):
+A higher-level orchestration layer (separate from the existing utility functions) will be
+responsible for creating and updating ``AuthzCourseAuthoringMigrationRun`` records. This
+layer wraps the core migration logic, ensuring that lifecycle tracking (opening a
+``running`` record, handling errors, and writing the final status) is applied consistently
+regardless of whether the migration is triggered by the signal handler or a management
+command.
+
+Migration Outcome Semantics
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``status`` field reflects the precise outcome of each run. The possible values are:
+
+- ``running``: the migration is actively executing.
+- ``completed``: all records were migrated successfully.
+- ``partial_success``: the migration process ran to completion, but one or more individual
+  records failed and were skipped. The ``metadata`` field contains details about the
+  failures.
+- ``failed``: a critical error prevented the migration from completing (e.g., an unhandled
+  exception or infrastructure problem). The ``metadata`` field contains the exception details.
+- ``skipped``: the migration was not attempted because another run for the same scope was
+  already active.
+
+3. Concurrency Control
+----------------------
+
+To prevent overlapping migrations on the same scope, the tracking model enforces a
+conditional ``UniqueConstraint`` on ``(scope_type, scope_key)`` filtered to
+``status="running"``. This guarantees that no second active migration record can be
+inserted for the same scope regardless of how many processes attempt to do so concurrently.
+Any attempt raises an ``IntegrityError``, which the caller handles by recording a
+``skipped`` run and aborting.
 
 .. code:: python
 
-    lock_key = f"authz_migration:{scope_type}:{scope_key}"
-
-The lock is acquired using ``cache.add()``, which is an atomic operation. The default TTL
-is **1 hour**. If a lock already exists for the given scope, the migration is skipped
-and a new tracking record is created with that status. This ensures that only one
-migration runs at a time for the same scope.
-
-6. Execution Flow
-------------------
-
-1. An operator changes the ``authz.enable_course_authoring`` flag for a course or
-   organization via Django Admin or a management command.
-2. The ``pre_save`` signal handler detects the state transition.
-3. A Celery task is dispatched asynchronously.
-4. The task calls the utility function, which acquires the lock, creates and updates the
-   ``AuthzCourseAuthoringMigrationRun`` record, and executes the migration.
-5. The operator can check the migration status via Django Admin on the ``AuthzCourseAuthoringMigrationRun``
-   model.
+    class Meta:
+        constraints = [
+            models.UniqueConstraint(
+                fields=["scope_type", "scope_key"],
+                condition=models.Q(status="running"),
+                name="unique_active_migration_per_scope",
+            )
+        ]
+
+4. Execution Flow
+-----------------
+
+1. The user changes the ``authz.enable_course_authoring`` flag for a course or
+   organization and saves the record. A new row is created in the override table.
+2. The ``post_save`` handler queries the same override model for the previous record
+   (most recent row for the same scope, excluding the one just saved) to obtain the
+   previous ``enabled`` value.
+3. The handler compares the previous value with the current ``enabled`` value. If no
+   effective change occurred, it does nothing.
+4. If a transition is detected, the handler calls the utility function synchronously. The
+   function creates an ``AuthzCourseAuthoringMigrationRun`` record with
+   ``status="running"`` (the database constraint prevents this if another run for the
+   same scope is already active) and executes the migration.
+5. The record is updated to its final status (``completed``, ``partial_success``, ``failed``,
+   or ``skipped``) before the ``post_save`` handler returns.
+6. The user can review the migration outcome via Django Admin on the
+   ``AuthzCourseAuthoringMigrationRun`` model.
 
 Consequences
 ************
@@ -179,36 +216,50 @@ Consequences
 Positive consequences
 =====================
 
-- **Migration is decoupled from the request cycle**: the flag change returns immediately and
-  migration happens in the background.
 - **Full observability**: every migration run is recorded with its status, scope, and metadata
   in the tracking model.
-- **Concurrency-safe**: the lock strategy prevents overlapping migrations on the same scope.
-- **No manual intervention required**: for course-level or organization-level flag changes. Operators
-  who have opted in do not need to remember to run management commands.
+- **Concurrency-safe**: the database-level constraint prevents overlapping migrations on the same
+  scope, regardless of cache availability or worker failures.
+- **No manual intervention required** for course-level or organization-level flag changes. Operators
+  or users who have opted in do not need to remember to run management commands.
 - **Safe by default**: the opt-in guard flag ensures that automatic migration is never triggered
   unexpectedly on instances where operators have not explicitly accepted the risks.
 
 Negative consequences / risks
-==============================
+=============================
 
 - **Global flag changes are not covered**: operators must still run management commands
   manually when enabling or disabling the flag at the instance level. This is a deliberate
   trade-off to avoid performance risks.
-- **Celery dependency**: the system now requires a functioning Celery worker for automatic
-  migration. If workers are down, migrations will be queued but not executed until workers
-  recover.
-- **Lock TTL edge cases**: if a migration takes longer than 1 hour (unlikely but possible
-  for very large organizations), the lock will expire and a new migration for the same scope
-  could start concurrently for the same scope.
+- **Blocks the request**: the migration runs synchronously inside the ``post_save`` signal,
+  so the HTTP request that triggered the flag change does not return until the migration
+  finishes. For large organization-level scopes this can cause noticeable latency or
+  timeouts. This is an accepted trade-off given that automatic migration is scoped to
+  course-level and organization-level overrides only (never global), and is opt-in.
+- **Runtime execution trade-offs**: Unlike management commands typically executed during
+  maintenance windows, this migration runs in a live production environment as part of
+  normal system operation. This means it executes under concurrent load, with active
+  requests and database activity, which introduces variability in execution conditions.
+  This trade-off is inherent to enabling the feature flag to act as a real-time source
+  of truth. The design prioritizes consistency between flag state and permission data
+  over strictly controlled execution environments, while providing observability and
+  recovery mechanisms to mitigate operational risk.
 
 Rejected Alternatives
 *********************
 
-**Synchronous execution in the signal handler**
-  Executing the migration directly inside the ``pre_save`` signal would block the HTTP
-  request that triggered the flag change, leading to timeouts for large scopes and poor
-  operator experience.
+**Using pre_save to trigger the migration**
+  A ``pre_save`` handler could detect the transition direction and execute the migration
+  before the flag change is written. This approach violates ACID principles: at the moment
+  ``pre_save`` fires, the new flag value has not yet been committed to the database. If the
+  subsequent ``save()`` were to fail (e.g., a validation error, a database constraint
+  violation, or a network issue), the migration would have already run against a state that
+  was never persisted, leaving the permission data inconsistent with the actual flag value.
+
+**Asynchronous execution via Celery**
+  Given that automatic migration is scoped to course-level and organization-level
+  overrides where migration volumes are bounded, synchronous execution is simpler
+  and provides stronger consistency guarantees.
 
 **Manual migration**
   Error-prone, not scalable, and inconsistent. The flag is the source of truth, but manual

From 72c52c701a5042531cd0a2b19fc48746a0d80748 Mon Sep 17 00:00:00 2001
From: Bryann Valderrama <bryann.valderrama@edunext.co>
Date: Tue, 14 Apr 2026 13:48:50 -0500
Subject: [PATCH 5/6] docs: refine rationale for rejecting pre_save in
 migration logic

---
 .../0013-course-authoring-automatic-migration.rst      | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/docs/decisions/0013-course-authoring-automatic-migration.rst b/docs/decisions/0013-course-authoring-automatic-migration.rst
index e97486a6..967a92fc 100644
--- a/docs/decisions/0013-course-authoring-automatic-migration.rst
+++ b/docs/decisions/0013-course-authoring-automatic-migration.rst
@@ -249,12 +249,10 @@ Rejected Alternatives
 *********************
 
 **Using pre_save to trigger the migration**
-  A ``pre_save`` handler could detect the transition direction and execute the migration
-  before the flag change is written. This approach violates ACID principles: at the moment
-  ``pre_save`` fires, the new flag value has not yet been committed to the database. If the
-  subsequent ``save()`` were to fail (e.g., a validation error, a database constraint
-  violation, or a network issue), the migration would have already run against a state that
-  was never persisted, leaving the permission data inconsistent with the actual flag value.
+  The use of pre_save signals was discarded because they depend on a state transition
+  that has not yet been committed to the database. Operating before persistence assumes
+  a future-valid state that may not materialize. post_save was preferred to ensure
+  migration logic operates only on confirmed states.
 
 **Asynchronous execution via Celery**
   Given that automatic migration is scoped to course-level and organization-level

From 7bd91945c01cc767650e1557d595b30ed3019e4c Mon Sep 17 00:00:00 2001
From: Bryann Valderrama <bryann.valderrama@edunext.co>
Date: Wed, 15 Apr 2026 07:13:19 -0500
Subject: [PATCH 6/6] docs: enhance migration outcome semantics with metadata
 details

---
 docs/decisions/0013-course-authoring-automatic-migration.rst | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/docs/decisions/0013-course-authoring-automatic-migration.rst b/docs/decisions/0013-course-authoring-automatic-migration.rst
index 967a92fc..038d688e 100644
--- a/docs/decisions/0013-course-authoring-automatic-migration.rst
+++ b/docs/decisions/0013-course-authoring-automatic-migration.rst
@@ -161,10 +161,11 @@ Migration Outcome Semantics
 The ``status`` field reflects the precise outcome of each run. The possible values are:
 
 - ``running``: the migration is actively executing.
-- ``completed``: all records were migrated successfully.
+- ``completed``: all records were migrated successfully. The ``metadata`` field contains the
+  details about the successful migrations.
 - ``partial_success``: the migration process ran to completion, but one or more individual
   records failed and were skipped. The ``metadata`` field contains details about the
-  failures.
+  failures and successfully migrated records.
 - ``failed``: a critical error prevented the migration from completing (e.g., an unhandled
   exception or infrastructure problem). The ``metadata`` field contains the exception details.
 - ``skipped``: the migration was not attempted because another run for the same scope was