Skip to content

filer: switch workspace upload from import-file to /workspace/import#5165

Draft
shreyas-goenka wants to merge 2 commits intomainfrom
shreyas-goenka/import-api
Draft

filer: switch workspace upload from import-file to /workspace/import#5165
shreyas-goenka wants to merge 2 commits intomainfrom
shreyas-goenka/import-api

Conversation

@shreyas-goenka
Copy link
Copy Markdown
Contributor

@shreyas-goenka shreyas-goenka commented May 4, 2026

Summary

Replace POST /api/2.0/workspace-files/import-file/{path}?overwrite=… with the multipart variant of POST /api/2.0/workspace/import (via the SDK's Workspace.Upload + format=AUTO). The legacy endpoint is being deprecated and the new endpoint is the strategic replacement.

Why multipart and not the JSON body of /workspace/import: the JSON body is server-capped at 10 MiB. Multipart accepts the same sizes import-file did (verified up to 250 MiB against a real workspace), so DAB users shipping wheels/jars/large files keep working.

Error mapping

Switched from raw aerr.StatusCode/ErrorCode comparisons to errors.Is against the SDK's apierr sentinels:

  • ErrNotFoundnoSuchDirectoryError (or mkdir+retry under CreateParentDirectories mode); also catches RESOURCE_DOES_NOT_EXIST.
  • ErrResourceAlreadyExistsfileAlreadyExistsError (the new endpoint reliably sets error_code RESOURCE_ALREADY_EXISTS).
  • ErrInvalidParameterValue + "Requested node type" message → fileAlreadyExistsError (existing object's node type doesn't match the upload — file vs notebook collision).
  • ErrPermissionDeniedpermissionError.

Same sentinel pattern applied to Delete, ReadDir, and Stat for 404 detection. DIRECTORY_NOT_EMPTY keeps an explicit ErrorCode check since the SDK has no sentinel for it.

End-to-end verification

format=AUTO is verified for every workspace-filesystem object type DABs cares about, against a real workspace:

Local file Workspace object_type
.py with # Databricks notebook source NOTEBOOK (PYTHON), extension stripped
.sql with -- Databricks notebook source NOTEBOOK (SQL), extension stripped
.ipynb NOTEBOOK (PYTHON), extension stripped
.py without header FILE
.lvdash.json DASHBOARD, extension preserved
regular files FILE
60 MB binary FILE (uploaded successfully — would have failed with JSON body)

Alerts / jobs / pipelines / schemas / etc. are not files in the workspace; they're created via dedicated REST APIs and don't go through the filer.

Test plan

  • Unit tests in libs/filer/workspace_files_client_test.go cover the new error mapping.
  • libs/testserver/handlers.go extended with multipart handler at /workspace/import.
  • acceptance/internal/prepare_server.go normalizes multipart bodies (sorted form fields, file parts recorded as {filename, size}) so request fixtures stay deterministic.
  • ~70 acceptance fixtures regenerated.
  • End-to-end verification against a real workspace for files, all notebook types, dashboards, and 60 MB binary.
  • Integration test TestFilerWorkspaceNotebook assertion updated to assert path with extension (tc.name); same change as filer: detect notebook already-exists across both error formats #5106.
  • CI green on this PR.

This pull request and its description were written by Isaac.

Replace `POST /api/2.0/workspace-files/import-file/{path}?overwrite=…`
with the multipart variant of `POST /api/2.0/workspace/import` (via the
SDK's `Workspace.Upload` + `format=AUTO`). The legacy endpoint is being
deprecated and the new endpoint is the strategic replacement.

The multipart variant is required because the JSON body of /workspace/import
is server-capped at 10 MiB; multipart accepts the same sizes import-file did
(verified up to 250 MiB against a real workspace), so DAB users shipping
wheels/jars/large files keep working.

Error mapping uses SDK sentinels via errors.Is rather than raw status/error
code comparisons:

- ErrNotFound → noSuchDirectoryError (or mkdir+retry under
  CreateParentDirectories mode); also catches RESOURCE_DOES_NOT_EXIST.
- ErrResourceAlreadyExists → fileAlreadyExistsError (the new endpoint
  reliably sets error_code RESOURCE_ALREADY_EXISTS).
- ErrInvalidParameterValue + "Requested node type" message →
  fileAlreadyExistsError (existing object's node type doesn't match the
  upload — file vs notebook collision).
- ErrPermissionDenied → permissionError.

Apply the same sentinel-based pattern to Delete, ReadDir, and Stat for
404 detection, matching the existing usage in bundle/direct/util.go and
following AGENTS.md's rule against branching on err.Error() string content.

DIRECTORY_NOT_EMPTY in Delete keeps an explicit ErrorCode check since the
SDK has no sentinel for it.

Test plan:
- libs/filer/workspace_files_client_test.go covers the new error mapping.
- libs/testserver/handlers.go extended with a multipart handler at
  /workspace/import that surfaces 409s from the shared fake as 400 +
  RESOURCE_ALREADY_EXISTS to match real-workspace shape.
- acceptance/internal/prepare_server.go normalizes multipart bodies
  (sorted form fields, file parts recorded as {filename, size}) so
  request fixtures stay deterministic.
- ~70 acceptance fixtures regenerated for the new request shape.
- End-to-end verified against a real workspace for files, all notebook
  types (.py / .sql / .ipynb / .lvdash.json / .scala / .r), dashboards,
  and a 60 MB binary upload.
- Integration test TestFilerWorkspaceNotebook assertion updated to assert
  the path with extension (tc.name) — matches the absPath returned in
  fileAlreadyExistsError. Same change as #5106.

Co-authored-by: Isaac
…ted goldens

Add a focused acceptance test (acceptance/bundle/sync/upload-edge-cases) that
exercises the multipart upload pipeline with the inputs that differ in shape
between the legacy import-file and the new /workspace/import endpoints:

- A 12 MiB binary file (above the JSON-body 10 MiB cap that the multipart
  variant lifts).
- An empty file (multipart encodes an empty content part distinct from JSON's
  empty string).
- A python notebook (auto-detected as NOTEBOOK; testserver mirrors extension
  stripping).
- A .lvdash.json dashboard descriptor (real workspace assigns DASHBOARD;
  testserver records the upload-side request shape).
- A non-ASCII filename (héllo.txt — multipart encodes filenames with quoted
  string rules).
- A filename with a space.

Assertions inspect out.requests.txt and pin: the set of multipart_form.path
values, that every upload sets format=AUTO.

Also restore 11 golden out.requests.txt files that the previous regenerate
sweep accidentally deleted (templates/telemetry/*, run/inline-script/**,
run/scripts/**, resources/volumes/change-schema-name) — they were present on
main and silently disappeared during the rebase that brought their goldens
together with the removal patch.

Co-authored-by: Isaac
@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/import-api branch from 4db3fb6 to 6907378 Compare May 6, 2026 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant