Skip to content

fix: scope large binary storage and cleanup by execution id#5280

Open
kunwp1 wants to merge 14 commits into
apache:mainfrom
kunwp1:fix/large-binary-eid-lifecycle
Open

fix: scope large binary storage and cleanup by execution id#5280
kunwp1 wants to merge 14 commits into
apache:mainfrom
kunwp1:fix/large-binary-eid-lifecycle

Conversation

@kunwp1
Copy link
Copy Markdown
Contributor

@kunwp1 kunwp1 commented May 28, 2026

What changes were proposed in this PR?

Large binaries were stored in the shared texera-large-binaries bucket under flat keys objects/{timestamp}/{uuid} with no execution id, and clearExecutionResources(eid) deleted all of them via LargeBinaryManager.deleteAllObjects(). Any cleanup for one execution therefore erased every other execution's (and user's) large binaries.

This PR namespaces every large binary by its execution id and scopes deletion:

  • Object keys are now objects/{eid}/{uuid} on both the JVM and Python workers.
  • The execution id is carried to workers via a new InitializeExecutorRequest.executionId proto field, injected by the system at executor init. The user-facing largebinary() / new LargeBinary() APIs are unchanged.
  • Cleanup uses the new LargeBinaryManager.deleteByExecution(eid) (prefix delete of objects/{eid}/). Both JVM and Python engines share the bucket and key shape, so this single JVM-side delete removes binaries created by both.
  • The deleteAllObjects() is removed.

Pre-existing objects under the old objects/{timestamp}/... scheme are left untouched.

Any related issues, documentation, discussions?

Closes #4123.

How was this PR tested?

Requires running ./bin/python-proto-gen.sh

Import the following json file to create two workflows, run them, and check if each execution creates 6 objects and one execution doesn't remove the other execution's large binary objects.
Large.Binary.Python (1).json

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Anthropic), models Claude Opus 4.7 and Claude Sonnet 4.6

kunwp1 added 9 commits May 28, 2026 10:56
Also update existing call site in RegionExecutionCoordinator to pass
None for the new field (required because ScalaPB no_default_values_in_constructor is true).
…he#4123)

betterproto returns an empty (falsy) ExecutionIdentity for an unset
executionId field rather than None, so the previous `is not None` check
never triggered and an unset id would silently produce objects/0/...
Use truthiness so unset -> None -> create() raises, matching the JVM
invariant. Also moves a stray mid-file `import re` to the top.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 28, 2026

Codecov Report

❌ Patch coverage is 75.86207% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.08%. Comparing base (ec12c88) to head (3330bf5).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...pache/texera/service/util/LargeBinaryManager.scala 60.00% 4 Missing and 2 partials ⚠️
...rg/apache/texera/web/service/WorkflowService.scala 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5280      +/-   ##
============================================
- Coverage     49.12%   49.08%   -0.04%     
+ Complexity     2378     2375       -3     
============================================
  Files          1051     1050       -1     
  Lines         40348    40304      -44     
  Branches       4279     4267      -12     
============================================
- Hits          19821    19784      -37     
+ Misses        19368    19358      -10     
- Partials       1159     1162       +3     
Flag Coverage Δ *Carryforward flag
access-control-service 41.89% <ø> (ø)
agent-service 33.76% <ø> (ø) Carriedforward from 8e1ebfb
amber 51.58% <61.11%> (+0.01%) ⬆️
computing-unit-managing-service 0.00% <ø> (ø)
config-service 0.00% <ø> (ø)
file-service 38.42% <ø> (+0.42%) ⬆️
frontend 40.91% <ø> (-0.17%) ⬇️ Carriedforward from 8e1ebfb
python 90.81% <100.00%> (+0.01%) ⬆️
workflow-compiling-service 56.81% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

kunwp1 added 3 commits May 28, 2026 13:29
…apache#4123)

Move the per-execution id out of StorageConfig (which holds only static
system configuration sourced from storage.conf) into a dedicated module-level
holder in large_binary_manager (set_current_execution_id), mirroring the JVM
LargeBinaryManager. The Python init handler sets it via that API.
Add get_current_execution_id() and route create() and the tests through it
instead of reading the module-level _current_execution_id directly, keeping
the holder's access encapsulated.
@kunwp1
Copy link
Copy Markdown
Contributor Author

kunwp1 commented May 28, 2026

/request-review @Xiao-zhen-Liu

Can you review this PR because you are an engine expert?

@github-actions github-actions Bot requested a review from Xiao-zhen-Liu May 28, 2026 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Finish the Life Cycle of Large Binaries

2 participants