Skip to content

BUD-12: Identical Media Deduplication#96

Open
v0l wants to merge 2 commits into
hzrd149:masterfrom
v0l:bud-12-identical-media
Open

BUD-12: Identical Media Deduplication#96
v0l wants to merge 2 commits into
hzrd149:masterfrom
v0l:bud-12-identical-media

Conversation

@v0l
Copy link
Copy Markdown
Contributor

@v0l v0l commented Mar 12, 2026

Summary

  • Adds BUD-12, a new optional spec for servers to detect and reject uploads of media identical to an already-stored blob
  • Servers respond with 409 Conflict and an X-Identical-Media: <sha256> header pointing to the existing equivalent blob
  • Detection method is left entirely to the server implementation (perceptual hashing, normalized hash, ML embeddings, etc.)
  • Clients receiving the response should use the returned hash as the canonical reference and mirror it to other servers via BUD-04 rather than re-uploading
  • Fully backwards compatible: clients that don't implement BUD-12 simply see a 409 upload failure

@v0l
Copy link
Copy Markdown
Contributor Author

v0l commented Mar 13, 2026

Example UI for route96

localhost_8000_

v0l added a commit to v0l/route96 that referenced this pull request Mar 13, 2026
Adds perceptual hash (pHash) based deduplication for image uploads per
the BUD-12 spec (hzrd149/blossom#96).

Backend:
- Compute pHash synchronously inside fs.put for every image upload;
  store the result on NewFileResult and propagate to FileUpload
- Insert the phash row inside the add_file transaction alongside the
  uploads row, satisfying the FK constraint on upload_phash
- On PUT /upload and PUT /media, query find_similar_images using the
  already-computed hash; return 409 Conflict with X-Identical-Media
  and X-Reason headers when a match is found within the configured
  Hamming distance
- New settings: identical_media_dedup (bool) and
  identical_media_dedup_distance (u32, default 0)

Frontend:
- IdenticalMediaError class in blossom.ts captures the existing sha256
  from X-Identical-Media on 409 responses
- Upload view shows a side-by-side comparison panel of the user's
  upload vs the existing server blob at full size (max-h-96)
- Mirror button calls PUT /mirror to register the existing blob to the
  user's account, then dismisses the panel
- Config editor: bool toggle for identical_media_dedup and integer
  field for identical_media_dedup_distance with min/max bounds
@v0l
Copy link
Copy Markdown
Contributor Author

v0l commented Mar 13, 2026

Added a mechanism for clients to acknowledge a deduplication response and request uploading a distinct copy anyway.

When a server returns 409 Conflict with X-Identical-Media: <sha256>, the client can echo that same header back in a retry request to signal it is aware of the existing equivalent blob and intentionally wants to store a separate copy.

Servers MAY honour this and proceed with the upload, or MAY continue to reject it — it is entirely up to server policy. This keeps deduplication enforcement server-side while giving clients a standard way to express intent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant