Skip to content

S3StorageClient.deleteDirectory only deletes the first ≤1000 objects per prefix #5281

@kunwp1

Description

@kunwp1

What happened?

S3StorageClient.deleteDirectory(bucketName, directoryPrefix) lists objects under the prefix with a single listObjectsV2 call and then issues one deleteObjects batch. listObjectsV2 returns at most 1000 keys per page and the method does not paginate (no continuation-token loop), so only the first ≤1000 objects under the prefix are ever deleted. Any objects beyond the first 1000 are silently orphaned.

Related latent limit: AWS DeleteObjectsRequest accepts at most 1000 keys per call, so once listing is paginated, deletions must also be chunked into batches of ≤1000.

Expected: deleteDirectory should remove all objects under the prefix regardless of count.

Affected code

common/workflow-core/src/main/scala/org/apache/texera/service/util/S3StorageClient.scaladeleteDirectory (~lines 105–145): single listObjectsV2 (no isTruncated/nextContinuationToken loop) + single deleteObjects.

Suggested fix

Paginate the listing via the continuation token until isTruncated is false, accumulating keys, and delete them in batches of ≤1000 per DeleteObjectsRequest.

Branch

main

Affected Area

Storage

Impact / Priority

(P3) Low–Medium — pre-existing; only affects executions producing >1000 objects under a single prefix, but causes silent storage leaks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions