utils.query: surface unhandled 4xx/5xx and stop mutating caller's payload#253
Draft
thodson-usgs wants to merge 4 commits intoDOI-USGS:mainfrom
Draft
utils.query: surface unhandled 4xx/5xx and stop mutating caller's payload#253thodson-usgs wants to merge 4 commits intoDOI-USGS:mainfrom
thodson-usgs wants to merge 4 commits intoDOI-USGS:mainfrom
Conversation
… payload Two related correctness fixes to the shared HTTP wrapper used by every module in the package. * The function only formatted explicit messages for 400, 404, 414, and 500/502/503. Any other status (401, 403, 405, 408, 429, 501, 504, …) fell through and the response was returned as if it had succeeded — callers parsed an HTML error page as RDB or CSV. Add a ``raise_for_status()`` after the explicit branches so every non-success surfaces, while keeping the friendlier messages for the codes we already format. * The body did ``for key, value in payload.items(): payload[key] = to_str(...)``, mutating the caller's dict. List values came back replaced with their stringified joins, breaking any caller that re-used the dict. Build a fresh ``params`` dict in a comprehension and leave the input alone. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Contributor
There was a problem hiding this comment.
Pull request overview
Improves the shared dataretrieval.utils.query() HTTP wrapper to (1) avoid mutating the caller-provided query payload and (2) surface previously-silent non-success HTTP statuses so downstream parsers don’t attempt to parse HTML error bodies as data.
Changes:
- Build a new
paramsdict forrequests.get(..., params=...)instead of mutating the inputpayload. - Add a
response.raise_for_status()catch-all after existing friendly status-code branches. - Add regression tests for payload immutability and for raising on additional 4xx/5xx statuses.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
dataretrieval/utils.py |
Stops mutating the caller’s payload and ensures unhandled 4xx/5xx statuses raise instead of returning an error body as “data”. |
tests/utils_test.py |
Adds regression tests for payload immutability and for raising on additional previously-unhandled HTTP error statuses. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
163
to
167
| url: string | ||
| URL to query | ||
| payload: dict | ||
| query parameters passed to ``requests.get`` | ||
| query parameters passed to ``requests.get``. Not mutated. | ||
| delimiter: string |
|
|
||
| # Catch-all for any other 4xx/5xx (401, 403, 405, 408, 429, 501, 504, ...) | ||
| # so callers don't silently receive an HTML error page as if it were data. | ||
| response.raise_for_status() |
Comment on lines
+72
to
+73
| with pytest.raises(Exception): # noqa: B017 -- HTTPError or ValueError | ||
| utils.query(url, {"k": "v"}) |
Per copilot review on PR DOI-USGS#253: - Re-raise the catch-all 4xx/5xx as ValueError with status/reason/url for parity with the explicit 400/404/414/500/502/503 branches; callers no longer need to handle both ValueError and HTTPError from the same helper. - Update the Returns section to document the actual return type (requests.Response) and the ValueError raise contract. - Narrow the test from pytest.raises(Exception) to pytest.raises( ValueError) with a match on the status code to assert the contract. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related correctness fixes to the shared HTTP wrapper used by every module in the package.
Unhandled 4xx/5xx silently passed through.
queryonly formatted explicit messages for 400, 404, 414, and 500/502/503 — every other status (401, 403, 405, 408, 429, 501, 504, …) fell through and the response was returned as if it had succeeded. Callers then parsed an HTML error page as RDB or CSV and reported confusing downstream errors. Adds aresponse.raise_for_status()-style fallback after the explicit branches so every non-success surfaces, while keeping the friendlier messages for the codes we already format.querymutated the caller's payload dict. The body didfor key, value in payload.items(): payload[key] = to_str(...)in place, so list values came back replaced with their stringified joins after the call returned. Build a freshparamsdict in a comprehension and leave the input alone.Minimal reproducible examples
Bug 1: pre-fix payload mutation
After this PR, the caller's dict is left untouched; only the request-time
paramsdict carries the joined string.Bug 2: pre-fix 4xx/5xx pass-through
After this PR, any non-success status raises
ValueError("HTTP 405 Method Not Allowed for ...")so the failure surfaces immediately.Test plan
test_url_too_longandtest_headercontinue to pass.API_USGS_PATset).