Skip to content

feat: Libraries v2 backup endpoint#37419

Merged
ormsbee merged 1 commit intoopenedx:masterfrom
rodmgwgu:feat/librarybackupapi
Oct 6, 2025
Merged

feat: Libraries v2 backup endpoint#37419
ormsbee merged 1 commit intoopenedx:masterfrom
rodmgwgu:feat/librarybackupapi

Conversation

@rodmgwgu
Copy link
Copy Markdown
Contributor

@rodmgwgu rodmgwgu commented Oct 1, 2025

Description

Adds endpoints for Library v2 backup functionality, related to the new library import/export functionality being implemented in openedx-learning by @dwong2708.

Use Case

  • Start an asynchronous task to back up the content of a library to a .zip file
  • Get a status on an asynchronous export task, including the download url

Example Requests

  • POST /api/libraries/v2/{library_id}/backup/
  • GET /api/libraries/v2/{library_id}/backup/?task_id={task_id}

POST Response Values

If the import task is started successfully, an HTTP 200 "OK" response is returned.

The HTTP 200 response has the following values:

  • task_id: UUID of the created task, usable for checking status

Example POST Response

{
    "task_id": "7069b95b-ccea-4214-b6db-e00f27065bf7"
}

GET Parameters

A GET request can include the following parameters:

  • task_id: (required) The UUID of the task to check.

GET Response Values

If the import task is found successfully by the UUID provided, an HTTP 200 "OK" response is returned.

The HTTP 200 response has the following values:

  • state: String description of the state of the task
  • url: (may be null) If the task is complete, a URL to download the .zip file

Example GET Response

{
    "state": "Succeeded",
    "url": "/media/user_tasks/2025/10/03/lib-wgu-csprob-2025-10-03-153633.zip"
}

Supporting information

Resolves: openedx/openedx-core#387
Required by: openedx/frontend-app-authoring#2448

Testing instructions

  1. Create a new library from the studio interface (from the "Libraries Beta" tab, not "Legacy Libraries")
  2. note the Library key from the URL when looking at the library details (something like lib:WGU:CSPROB)
  3. do a POST to /api/libraries/v2/lib:WGU:CSPROB/backup/ (change the library key with yours)
  4. You should get a response like {"task_id":"747e5ab3-5aef-4aef-88a3-6a195c849e12"}
  5. do a GET to /api/libraries/v2/lib:WGU:CSPROB/backup/?task_id=747e5ab3-5aef-4aef-88a3-6a195c849e12 (change the library key and the task_id to yours)
  6. You should get a response like:
{
    "state": "Succeeded",
    "url": "/media/user_tasks/2025/10/03/lib-wgu-csprob-2025-10-03-153633.zip"
}

Deadline

Ulmo Release

Other information

  • Depends on changes in openedx-learning, which are partially merged as of Oct 3 2025. This means the export works, but the contents of the generated zip file may be incomplete.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Oct 1, 2025
@openedx-webhooks
Copy link
Copy Markdown

openedx-webhooks commented Oct 1, 2025

Thanks for the pull request, @rodmgwgu!

This repository is currently maintained by @openedx/wg-maintenance-edx-platform.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Details
Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

Comment thread openedx/core/djangoapps/content_libraries/api/libraries.py Outdated
Comment thread openedx/core/djangoapps/content_libraries/rest_api/libraries.py
@rodmgwgu rodmgwgu force-pushed the feat/librarybackupapi branch from ab2af32 to b9d8748 Compare October 2, 2025 15:32
@dwong2708
Copy link
Copy Markdown
Contributor

Could you please include the github issue in your description? Something like "Resolves: openedx/openedx-core#387" at the top.

Comment thread openedx/core/djangoapps/content_libraries/tasks.py Outdated
Comment thread openedx/core/djangoapps/content_libraries/tasks.py Outdated
@mphilbrick211 mphilbrick211 added the mao-onboarding Reviewing this will help onboard devs from an Axim mission-aligned organization (MAO). label Oct 2, 2025
@mphilbrick211 mphilbrick211 moved this from Needs Triage to Waiting on Author in Contributions Oct 2, 2025
@rodmgwgu rodmgwgu force-pushed the feat/librarybackupapi branch 3 times, most recently from a9e1d2e to 2333d13 Compare October 3, 2025 18:08
@rodmgwgu rodmgwgu marked this pull request as ready for review October 3, 2025 18:48
@rodmgwgu rodmgwgu requested a review from ormsbee October 3, 2025 18:49
@ormsbee ormsbee moved this from Waiting on Author to In Eng Review in Contributions Oct 3, 2025
Copy link
Copy Markdown
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few relatively minor questions and requests.

I'm a little uncomfortable with the amount of mocking necessary in the tests, and I do wish we could do a bit more in the way of end-to-end testing (e.g. creating a small library and exporting it). But that's probably not the best use of time at the moment.

Thank you!

Comment thread openedx/core/djangoapps/content_libraries/api/libraries.py Outdated
Comment thread openedx/core/djangoapps/content_libraries/rest_api/libraries.py
Comment thread openedx/core/djangoapps/content_libraries/api/libraries.py Outdated
Comment thread openedx/core/djangoapps/content_libraries/tasks.py Outdated
- Failed: Task failed and the export did not complete.
"""
ensure_cms("backup_library may only be executed in a CMS context")
set_code_owner_attribute_from_module(__name__)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line shouldn't be necessary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that most of the tasks on this module have the @set_code_owner_attribute decorator, which is not compatible with the @shared_task decorator when using UserTask, that's why I'm manually setting it here, as it's done also on the existing course export/import tasks.

Although I'm not sure I understand 100% why it's needed there, are you sure this is not needed here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the code owner is something we did at edX so that if the APM error monitoring started throwing a bunch of errors on the endpoint, the person on call would know who to contact to help triage the issue. It also makes it easier for teams to build dashboards around it. I guess there's no harm in keeping it, but I don't think anyone will be looking at the data that it generates.

@robrap, @feanil: Am I remembering that correctly?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This decorator is no longer needed. See:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!, I'll remove it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After removing set_code_owner_attribute_from_module, the "Semgrep code quality" check is now failing with:

test_root.semgrep.celery-missing-code-owner-function
Celery tasks need to be decorated with
@set_code_owner_attribute (from the
edx_django_utils.monitoring module) in order for us to
correctly track code-owners for errors and in other
monitoring.

      For more information, see the Celery section of "Add     
      Code_Owner Custom Attributes to an IDA" in the Monitoring
      How-Tos of <https://docs.openedx.org/projects/edx-django-
      utils/en/latest/>.

It seems there is a test specific for this still up

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies. You can restore it, or remove the test. It’s a DEPR that requires attention at some point.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, restored it for now.

Comment thread openedx/core/djangoapps/content_libraries/rest_api/libraries.py Outdated
Comment thread openedx/core/djangoapps/content_libraries/rest_api/libraries.py

with open(file_path, 'rb') as zipfile:
artifact = UserTaskArtifact(status=self.status, name='Output')
artifact.file.save(name=os.path.basename(zipfile.name), content=File(zipfile))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this generated file have a difficult-to-guess URL (e.g. a randomized value for the directory prefix to the file)? My understanding is that UserTaskArtifacts are not secured via auth by default, and that we have to rely on making difficult-to-brute-force URLs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the default filesystem store, this is correct, it's not secured by default, however other stores may implement security themselves, for example, the S3 store does have this and generates a temporary signed url for the file.

In my tests, the files generates using the filesystem storage look like "/media/user_tasks/2025/10/03/lib-wgu-csprob-2025-10-03-153633.zip", the only thing that would make them kind of difficult to guess are the datetime in the file name.

I could add a random postfix to the filename, I'm not sure if I can control the directory as it's managed by the UserTasks library.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the S3 version has a temp signed url for the file, I think that's acceptable. This is one of the things we'll want to include in the release notes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is more info on how this behaves, from the django-user-tasks docs: https://django-user-tasks.readthedocs.io/en/latest/authorization.html#artifact-url-access

@rodmgwgu rodmgwgu force-pushed the feat/librarybackupapi branch from a164a55 to fd78de7 Compare October 3, 2025 21:59
# Get the latest task for this user and library
task_status = UserTaskStatus.objects.filter(
user_id=user_id,
name__contains=str(library_key)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So at a low level, I think we'd want to filter by task_class (so there's no mixing of backup and restore tasks), and put a bound on created (automated processes may make many, many tasks over time).

But at a higher level, is this part of the API here to service the use case where someone wanders away from the backup page and then comes back to it? Because if so, I'd rather we just not handle that case yet, and require the user to stay on the page. It shouldn't take that long to create the backup, and it's not clear to me how long a backup should remain "good", or what exactly the logic for invalidating a backup download should be, if we allow people to access historical backups.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I removed that use case, now we always need to specify the task_id. thanks!

@rodmgwgu rodmgwgu force-pushed the feat/librarybackupapi branch from d40ac9e to dc941af Compare October 6, 2025 20:51
@rodmgwgu rodmgwgu requested a review from ormsbee October 6, 2025 21:19
Copy link
Copy Markdown
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very small request, and then I think we're good to squash and merge. Thank you!

Comment on lines +706 to +715
task_status = None
if task_id:
# Get the specific task by ID
try:
task_status = UserTaskStatus.objects.get(task_id=task_id, user_id=user_id)
except UserTaskStatus.DoesNotExist:
return None

if not task_status:
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since task_id is now required, we don't need all this any longer. We can do something more like:

Suggested change
task_status = None
if task_id:
# Get the specific task by ID
try:
task_status = UserTaskStatus.objects.get(task_id=task_id, user_id=user_id)
except UserTaskStatus.DoesNotExist:
return None
if not task_status:
return None
try:
task_status = UserTaskStatus.objects.get(task_id=task_id, user_id=user_id)
except UserTaskStatus.DoesNotExist:
return None

@rodmgwgu rodmgwgu force-pushed the feat/librarybackupapi branch from dc941af to d56a1b0 Compare October 6, 2025 21:36
@ormsbee ormsbee merged commit 0c9997c into openedx:master Oct 6, 2025
65 checks passed
@github-project-automation github-project-automation Bot moved this from In Eng Review to Done in Contributions Oct 6, 2025

if task_status.state == UserTaskStatus.SUCCEEDED:
artifact = UserTaskArtifact.objects.get(status=task_status, name='Output')
result['url'] = artifact.file.storage.url(artifact.file.name)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rodmgwgu CC @holaontiveros Do you think we can update this to return an absolute URL, before Ulmo? REST APIs should generally only return absolute URLs, doubly so if the field is called "url", and triply so if we're talking about file storage which may be on a totally separate server/domain like S3. But this API is currently returning just the path part.

I noticed this here in the corresponding frontend PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bradenmacdonald I modeled this one around the existing course export code here, which is doing the same, it seems that if it's S3, it will give us an absolute url, but for filestorage it returns a relative path.

But what you mention makes sense, I'll review this to always return an absoulte URL.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bradenmacdonald here is the PR addressing this issue: #37508

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mao-onboarding Reviewing this will help onboard devs from an Axim mission-aligned organization (MAO). open-source-contribution PR author is not from Axim or 2U

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Implement Library Backup Endpoint

7 participants