Skip to content

Fix Watershed class: persist instance state and parse JSON#245

Draft
thodson-usgs wants to merge 7 commits intoDOI-USGS:mainfrom
thodson-usgs:fix/streamstats-watershed-class
Draft

Fix Watershed class: persist instance state and parse JSON#245
thodson-usgs wants to merge 7 commits intoDOI-USGS:mainfrom
thodson-usgs:fix/streamstats-watershed-class

Conversation

@thodson-usgs
Copy link
Copy Markdown
Collaborator

@thodson-usgs thodson-usgs commented May 3, 2026

Summary

  • Watershed.from_streamstats_json was a classmethod that mutated cls instead of building an instance, causing every watershed to share class-level state and the method to return the class itself rather than a Watershed.
  • Watershed.__init__(rcode, x, y) called get_watershed(...) and discarded the result, so a freshly-constructed instance had no attributes of its own.
  • get_watershed(format="object") returned None instead of the parsed JSON dict its docstring promised.

This PR makes from_streamstats_json return a real instance, makes __init__ populate self via get_watershed(format="object"), and makes format="object" actually return the parsed JSON. The legacy _workspaceID attribute is preserved as a read-only alias for the new workspace_id.

While here:

  • Dropped the dead format == "shape" branch (a stale Fiona-stub pass that fell through to a Watershed despite documenting a different return type).
  • Made the format contract explicit: added a "watershed" case alongside "geojson"/"object", and unrecognized values now raise ValueError. Previously any unrecognized format silently fell through to a Watershed — the same bug shape that masked the broken "shape" branch.
  • Fixed get_sample_watershed to actually return a Watershed instance (preexisting docstring/behavior mismatch — it called get_watershed(...) with the default format="geojson" and so returned the raw Response).
  • Switched json.loads(r.text) to r.json() to match the convention used in nldi.py/nwis.py.
  • Rewrote the get_watershed narrative docstring to describe the multi-return contract (Response / dict / Watershed selected by format) instead of the previous "always returns a watershed" wording, which contradicted the actual default behavior.

Minimal reproducible examples

Reconstructed inline because the live StreamStats endpoint is currently returning 404 across requests; the bug shapes are pure-Python and don't depend on the API.

Bug 1: from_streamstats_json mutates the class

class BuggyWatershed:
    @classmethod
    def from_streamstats_json(cls, blob):
        cls.workspace_id = blob["workspaceID"]   # mutates the class
        cls.parameters = blob["parameters"]
        return cls                                # returns the class itself

a = BuggyWatershed.from_streamstats_json({"workspaceID": "WS_A", "parameters": ["a"]})
b = BuggyWatershed.from_streamstats_json({"workspaceID": "WS_B", "parameters": ["b"]})
print(a is BuggyWatershed)                       # True — not an instance!
print(a.workspace_id, b.workspace_id)            # WS_B WS_B — shared class-level state

Bug 2: __init__ discards the work

class BuggyWatershed:
    def __init__(self, rcode, x, y):
        _ = get_watershed(rcode, x, y)            # result discarded

ws = BuggyWatershed("MA", -72.7, 42.6)
print(hasattr(ws, "workspace_id"))                # False

Bug 3: format="object" returns None

def buggy_get_watershed(rcode, x, y, format="geojson"):
    if format == "geojson":
        return _response
    elif format == "object":
        pass                                      # implicit None

After this PR, format="object" returns the parsed dict, Watershed.from_streamstats_json returns a fresh instance with isolated state, and Watershed(...) populates self.

Test plan

  • One regression test in tests/streamstats_test.py covering the load-bearing behavior change: two watersheds built from different JSON must not share state via class-level attributes (test_from_streamstats_json_does_not_mutate_class). The other behaviors — from_streamstats_json returns an instance, format="object" returns a dict, __init__ populates self, _workspaceID resolves — are each one-line obvious fixes; their tests were dropped as trivial.
  • Full local test suite passes.
  • Reviewer: live-API verification was attempted but the StreamStats watershed.geojson endpoint is currently returning HTTP 404 for all tested coordinates (service-side issue independent of this PR). Suggested follow-up: re-run the existing test_get_watershed integration once StreamStats recovers.

Follow-up

Filed a separate issue to discuss whether get_watershed should always return a Watershed (drop the format parameter). That would match the function name and the original narrative-docstring promise, but is a breaking change and out of scope for this bug-fix PR.

…ershed

`Watershed.from_streamstats_json` was a classmethod that assigned to `cls`
instead of an instance, so every watershed shared class-level attributes
and the method returned the class itself. `Watershed.__init__` called
`get_watershed(...)` and discarded the result, so instances had no state
of their own. And `get_watershed(format="object")` returned `None` rather
than the parsed JSON the docstring promised.

`from_streamstats_json` now builds a real instance, `__init__` requests
`format="object"` and populates `self`, and `get_watershed(format="object")`
returns the parsed dict. `_workspaceID` is preserved as a read-only alias
of the new `workspace_id` attribute.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes dataretrieval.streamstats.Watershed construction/parsing so instances hold their own state and get_watershed(format="object") returns parsed JSON as advertised, with new regression tests to prevent class-level state leaks.

Changes:

  • Make get_watershed(format="object") return a parsed JSON dict (instead of None) and reuse the parsed data for Watershed creation.
  • Rework Watershed.from_streamstats_json to construct and populate a real instance (no class-level mutation), and make Watershed.__init__ populate instance attributes.
  • Add tests/streamstats_test.py covering the regressions and the _workspaceID back-compat alias.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
dataretrieval/streamstats.py Fixes JSON parsing/return behavior and corrects Watershed instance construction/state population.
tests/streamstats_test.py Adds regression tests for instance state, object-format return, init population, and _workspaceID aliasing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dataretrieval/streamstats.py Outdated
Comment on lines 139 to 145
data = json.loads(r.text)

if format == "object":
# return a python object
pass
return data

data = json.loads(r.text)
return Watershed.from_streamstats_json(data)

thodson-usgs and others added 4 commits May 4, 2026 10:04
Per copilot review on PR DOI-USGS#245.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Per /simplify review on PR DOI-USGS#245:
- Replace `json.loads(r.text)` with `r.json()` (matching the convention
  used in `nldi.py` and `nwis.py`) and drop the now-unused `import
  json` from `dataretrieval/streamstats.py`.
- Remove the dead `format == "shape"` branch — a stale Fiona-stub
  (`# use Fiona to return a shape object` + `pass`) that silently fell
  through to `Watershed.from_streamstats_json(data)`. The docstring no
  longer mentions `"shape"` so removing the branch matches the contract.
- Drop four trivial unit tests that each exercised a one-line behavior
  change obvious from the diff (`from_streamstats_json` returns an
  instance, `format="object"` returns a dict, `__init__` populates self,
  `_workspaceID` alias resolves). Keep the load-bearing one:
  `test_from_streamstats_json_does_not_mutate_class` guards against
  re-introducing the class-mutation antipattern under refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Previously `get_watershed` had two named formats (`"geojson"`, `"object"`)
and any other value silently fell through to a `Watershed` — the same
bug shape that masked the dead `"shape"` branch removed earlier in this
PR. Add an explicit `"watershed"` case and raise `ValueError` on
unrecognized values so a typo'd format fails loudly instead of
returning whatever the fallthrough is.

Also fix `get_sample_watershed`, whose docstring promises a `Watershed`
but called `get_watershed(...)` with the default `format="geojson"`
(returns the raw `requests.Response`). Pass `format="watershed"`
explicitly so the function matches its documented return type.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +106 to +110
format: string, optional
Selects the return shape. ``"geojson"`` (default) returns the raw
``requests.Response``; ``"object"`` returns the parsed JSON ``dict``;
``"watershed"`` returns a :obj:`Watershed` instance built from the
parsed JSON. Any other value raises ``ValueError``.
Comment on lines 135 to +148
if format == "geojson":
return r

if format == "shape":
# use Fiona to return a shape object
pass
data = r.json()

if format == "object":
# return a python object
pass
return data

if format == "watershed":
return Watershed.from_streamstats_json(data)

data = json.loads(r.text)
return Watershed.from_streamstats_json(data)
raise ValueError(
f"Invalid format {format!r}; expected 'geojson', 'object', or 'watershed'."
)
Per Copilot review: the narrative section claimed `get_watershed`
"Returns a watershed object", but with the explicit `format` contract
the function actually returns a `requests.Response` (default), a
`dict`, or a `Watershed` depending on `format`. Rewrite the narrative
to describe the three return shapes accurately.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants