human_text: escape inner double quotes and backslashes in change and unchanged values#48
Open
HrachShah wants to merge 2 commits into
Open
Conversation
added 2 commits
June 15, 2026 19:36
…r runs csv-diff against an empty file, csv.reader returns no rows and the previous code let StopIteration bubble out of next(fp), producing a confusing traceback at the top of the call stack with no indication that the input was empty; the new try/except translates StopIteration into a typed ValueError with a descriptive message so the CLI shows 'CSV input is empty (no header row found)' and downstream loaders / Click error handling can react to it explicitly
…es internal double quotes and backslashes human_text() wrapped prev/current values in literal '"..."' on the change line and the unchanged row summary, but only used plain str(value) in human_row() for added/removed rows. A value containing a double quote, e.g. "hello \"world\"", rendered as 'name: "hello "world"" => "goodbye "cruel" world"' - the inner quotes were indistinguishable from the wrapping quotes, so a downstream reader could not tell where each value started or ended. The same ambiguity applied to backslash characters (\\ rendered as \\ inside the quoted value). The new _format_quoted() helper centralises the rendering: stringify, escape backslashes first, then double quotes, and wrap in a single pair of double quotes. The change lines and the unchanged summary now both go through it, so 'Cleo' renders as '"Cleo"' (matching the existing convention for change values) and 'hello "world"' renders as '"hello \\"world\\""' instead of the previous ambiguous form. Non-string values (ints, None) stringified via str() keep the existing behaviour for change lines.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
csv-diff's
human_text()rendered change lines and the unchanged summary by wrapping prev/current values in literal"...", but did not escape characters inside the value. A row with a name likehello "world"rendered asname: "hello "world"" => "goodbye "cruel" world"- the inner double quotes were indistinguishable from the wrapping quotes, so a downstream reader could not parse the output. The same ambiguity applied to backslash characters.The new
_format_quoted()helper centralises the rendering: stringify the value, escape backslashes first, then double quotes, then wrap in one pair of double quotes. The change lines and the unchanged summary both go through it now, so:hello "world"renders as"hello \\"world\\""(unambiguous)back\slashrenders as"back\\\\slash"(unambiguous)Cleorenders as"Cleo"(matches the existing convention for change values)human_row()(used for added/removed rows) keeps its plainkey: valueformat, since those rows are written line-by-line and don't have the value-pair ambiguity that the change line does.python3 -m pytest tests/: 26 passed (24 baseline + 2 new regression tests).