Skip to content

fix: treat Compression.NONE sentinel as uncompressed in COPY INTO formats#75

Open
simozzy wants to merge 1 commit into
databendlabs:mainfrom
PlaidCloud:upstream-pr/compression-none-sentinel
Open

fix: treat Compression.NONE sentinel as uncompressed in COPY INTO formats#75
simozzy wants to merge 1 commit into
databendlabs:mainfrom
PlaidCloud:upstream-pr/compression-none-sentinel

Conversation

@simozzy

@simozzy simozzy commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Problem

Compression.NONE is a truthy Enum member, so the if compression: guard in CSVFormat, TSVFormat, NDJSONFormat, and ParquetFormat treats the no-compression sentinel as "a codec was specified":

if compression:                                   # Compression.NONE is truthy!
    if compression not in [Compression.ZSTD, Compression.SNAPPY]:
        raise TypeError('Compression should be None, ZStd, or Snappy.')
    self.options["COMPRESSION"] = compression.value
  • ParquetFormat(compression=Compression.NONE) raises TypeError: Compression should be None, ZStd, or Snappy. — even though NONE is the "no compression" case.
  • The CSV/TSV/NDJSON classes don't raise, but emit a spurious COMPRESSION = NONE option for the sentinel.

Fix

Skip the block for both None and the Compression.NONE sentinel, so the uncompressed case renders no COMPRESSION option:

if compression and compression is not Compression.NONE:
    ...

Applied to all four format classes. Explicit codecs are unchanged, and ParquetFormat still rejects codecs its writer can't use (gzip/bz2/zip).

The four format classes also now set inherit_cache = False (matching CopyIntoTable/CopyIntoLocation), which silences SQLAlchemy's "will not make use of SQL compilation caching" warning when a format is compiled — needed for the new compile tests to pass under the suite's warning-as-error config.

Tests

tests/test_copy_format.py — a fixtures.TestBase + AssertsCompiledSQL class mirroring tests/test_copy_into.py, asserting the rendered FILE_FORMAT = (...) via assert_compile:

  • None and Compression.NONE emit no COMPRESSION option, across all four classes.
  • Explicit codecs are still emitted where valid.
  • ParquetFormat still raises TypeError for unsupported codecs.

…mats

Compression.NONE is a truthy Enum member, so the `if compression:` guard
in CSVFormat/TSVFormat/NDJSONFormat/ParquetFormat treated the
no-compression sentinel as 'a codec was specified'. For ParquetFormat this
raised TypeError('Compression should be None, ZStd, or Snappy.') outright;
for the others it emitted a spurious COMPRESSION = NONE option.

Guard now skips both None and Compression.NONE, so the uncompressed case
renders no COMPRESSION option. Explicit codecs are unchanged, and
ParquetFormat still rejects codecs its writer can't use (gzip/bz2/zip).

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e9239985e4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

)
self.options["MISSING_FIELD_AS"] = f"{missing_field_as}"
if compression:
if compression and compression is not Compression.NONE:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject NONE for Parquet instead of using defaults

When callers pass ParquetFormat(compression=Compression.NONE), this condition now skips validation and emits no COMPRESSION option. For Parquet, Databend only supports ZSTD/SNAPPY for this option and the omitted-option default is ZSTD, so a user asking for no compression silently gets ZSTD-compressed output instead of the previous clear rejection. Keep Compression.NONE rejected for Parquet rather than treating it like None.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants