Skip to content

fix: East Cambs: Use new API backend#2132

Open
tomsutch wants to merge 1 commit into
robbrad:masterfrom
tomsutch:fix/east-cambs-new-checker
Open

fix: East Cambs: Use new API backend#2132
tomsutch wants to merge 1 commit into
robbrad:masterfrom
tomsutch:fix/east-cambs-new-checker

Conversation

@tomsutch

@tomsutch tomsutch commented Jun 12, 2026

Copy link
Copy Markdown

New AchieveForms API and 3-step flow necessary after changes to bin collection service in East Cambridgeshire from 1 June 2026.

Inspired by similar fix to mampfes/hacs_waste_collection_schedule (which is also MIT-licensed)

The new system appears not to return any collections for public buildings, such as the district council offices which was the test UPRN. Changed this to the Lamb Hotel, Ely.

Summary by CodeRabbit

  • Refactor
    • Updated East Cambridgeshire Council bin collection data retrieval mechanism.

New AchieveForms API and 3-step flow necessary after changes to bin collection
service in East Cambridgeshire from 1 June 2026.

Inspired by similar fix to mampfes/hacs_waste_collection_schedule (which
is also MIT-licensed)

The new system appears not to return any collections for public buildings,
such as the district council offices which was the test UPRN. Changed this to
the Lamb Hotel, Ely.
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

EastCambridgeshireCouncil parser refactored from direct HTML scraping to an AchieveForms/APIBroker-based workflow. Session priming, token retrieval, and XML parsing replace prior HTML parsing logic. Test UPRN data updated with corresponding formatting adjustment.

Changes

East Cambridgeshire Council Parser Refactor

Layer / File(s) Summary
Parse data refactor with AchieveForms workflow
uk_bin_collection/uk_bin_collection/councils/EastCambridgeshireCouncil.py
parse_data method refactored from HTML scraping to three-step AchieveForms/APIBroker flow: session priming against forms endpoint, auth token extraction and retrieval, collections lookup using uprn and 42-day date window, XML response parsing of Row elements to extract bin type and collection date, with updated date formatting and row validation.
Test data UPRN update
uk_bin_collection/tests/input.json
EastCambridgeshireCouncil test uprn value replaced; end-of-file formatting adjusted.

Possibly related PRs

Suggested reviewers

  • dp247

Poem

🐰 A council bin parser takes flight,
From HTML scraping to APIs bright,
AchieveForms dances with tokens in tow,
XML rows bloom in the API flow,
Forty-two days of collections aligned! 📦


🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately describes the main change: updating East Cambridgeshire Council's bin collection checker to use a new API backend, which is the primary refactoring in the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tomsutch

Copy link
Copy Markdown
Author

Fixes #2117

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/EastCambridgeshireCouncil.py`:
- Around line 43-44: The code extracts a session id via sid_match =
re.search(...) and then immediately calls sid_match.group(1) which will raise
AttributeError if the regex fails; update the logic in the method in
EastCambridgeshireCouncil (where r is the priming response) to check if
sid_match is None and raise a clear exception (e.g., ValueError or RuntimeError)
with a descriptive message like "Failed to find auth-session in priming
response" (include response.status_code or snippet of r.text for debugging),
otherwise set sid = sid_match.group(1) as before.
- Around line 61-69: The current extraction of auth_token from r_auth.json()
silently returns an empty string if any nested key is missing (auth_data ->
integration -> transformed -> rows_data -> "0" -> AuthenticateResponse), so add
an explicit validation after parsing auth_data: attempt to extract auth_token
from auth_data (using the same nested keys or safer index checks), and if
auth_token is missing or falsy raise a clear exception (ValueError or a custom
error) with a message that includes the received auth_data (or enough context)
to make debugging easier; reference the variables r_auth, auth_data, and
auth_token when adding the check and raising the error so the failure is
explicit rather than allowing downstream cryptic errors.
- Around line 98-101: The code reads r_col.json() and directly indexes
col_data['data'] without checking the HTTP response or JSON contents; add
r_col.raise_for_status() immediately after the request (like Step 2 does), then
parse JSON inside a try/except catching JSONDecodeError and KeyError around
col_data = r_col.json(); if the 'data' key is missing log or raise a descriptive
error mentioning r_col.status_code and the response body, and only then pass
col_data['data'] into BeautifulSoup so failures are surfaced with context.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8ec0443f-a434-4075-a836-d13843471d99

📥 Commits

Reviewing files that changed from the base of the PR and between b65502c and 2906f82.

📒 Files selected for processing (2)
  • uk_bin_collection/tests/input.json
  • uk_bin_collection/uk_bin_collection/councils/EastCambridgeshireCouncil.py

Comment on lines +43 to +44
sid_match = re.search(r'"auth-session":"([^"]+)"', r.text)
sid = sid_match.group(1)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing null check on regex match causes AttributeError on unexpected response.

If the session priming response doesn't contain the expected auth-session pattern (e.g., API format changes, network error returns HTML error page), sid_match will be None and .group(1) raises AttributeError. Per project conventions, prefer explicit failures with clear error messages.

Proposed fix
         sid_match = re.search(r'"auth-session":"([^"]+)"', r.text)
+        if not sid_match:
+            raise ValueError("Failed to extract auth-session from AchieveForms response")
         sid = sid_match.group(1)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sid_match = re.search(r'"auth-session":"([^"]+)"', r.text)
sid = sid_match.group(1)
sid_match = re.search(r'"auth-session":"([^"]+)"', r.text)
if not sid_match:
raise ValueError("Failed to extract auth-session from AchieveForms response")
sid = sid_match.group(1)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/EastCambridgeshireCouncil.py`
around lines 43 - 44, The code extracts a session id via sid_match =
re.search(...) and then immediately calls sid_match.group(1) which will raise
AttributeError if the regex fails; update the logic in the method in
EastCambridgeshireCouncil (where r is the priming response) to check if
sid_match is None and raise a clear exception (e.g., ValueError or RuntimeError)
with a descriptive message like "Failed to find auth-session in priming
response" (include response.status_code or snippet of r.text for debugging),
otherwise set sid = sid_match.group(1) as before.

Source: Learnings

Comment on lines +61 to +69
r_auth.raise_for_status()
auth_data = r_auth.json()
auth_token = (
auth_data.get("integration", {})
.get("transformed", {})
.get("rows_data", {})
.get("0", {})
.get("AuthenticateResponse", "")
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Silent failure when auth token extraction fails.

The chained .get() calls return an empty string if any key is missing, allowing the flow to continue with an invalid/empty token. This will likely cause a cryptic failure downstream in Step 3. Per project conventions, raise an explicit exception when the expected response structure is missing.

Proposed fix
         r_auth.raise_for_status()
         auth_data = r_auth.json()
         auth_token = (
             auth_data.get("integration", {})
             .get("transformed", {})
             .get("rows_data", {})
             .get("0", {})
             .get("AuthenticateResponse", "")
         )
+        if not auth_token:
+            raise ValueError("Failed to extract AuthenticateResponse token from API response")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/EastCambridgeshireCouncil.py`
around lines 61 - 69, The current extraction of auth_token from r_auth.json()
silently returns an empty string if any nested key is missing (auth_data ->
integration -> transformed -> rows_data -> "0" -> AuthenticateResponse), so add
an explicit validation after parsing auth_data: attempt to extract auth_token
from auth_data (using the same nested keys or safer index checks), and if
auth_token is missing or falsy raise a clear exception (ValueError or a custom
error) with a message that includes the received auth_data (or enough context)
to make debugging easier; reference the variables r_auth, auth_data, and
auth_token when adding the check and raising the error so the failure is
explicit rather than allowing downstream cryptic errors.

Source: Learnings

Comment on lines +98 to +101
)

col_data = r_col.json()
soup = BeautifulSoup(col_data['data'], features="xml")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing raise_for_status() and error handling for collections response.

Step 3 (r_col) doesn't call raise_for_status() unlike Step 2, and directly accesses col_data['data'] which will raise a KeyError with no context if the key is missing. Add consistent error handling.

Proposed fix
         timeout=30,
     )
-        
+    r_col.raise_for_status()
     col_data = r_col.json()
+    if "data" not in col_data:
+        raise ValueError("Collections response missing 'data' field")
     soup = BeautifulSoup(col_data['data'], features="xml")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
)
col_data = r_col.json()
soup = BeautifulSoup(col_data['data'], features="xml")
)
r_col.raise_for_status()
col_data = r_col.json()
if "data" not in col_data:
raise ValueError("Collections response missing 'data' field")
soup = BeautifulSoup(col_data['data'], features="xml")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@uk_bin_collection/uk_bin_collection/councils/EastCambridgeshireCouncil.py`
around lines 98 - 101, The code reads r_col.json() and directly indexes
col_data['data'] without checking the HTTP response or JSON contents; add
r_col.raise_for_status() immediately after the request (like Step 2 does), then
parse JSON inside a try/except catching JSONDecodeError and KeyError around
col_data = r_col.json(); if the 'data' key is missing log or raise a descriptive
error mentioning r_col.status_code and the response body, and only then pass
col_data['data'] into BeautifulSoup so failures are surfaced with context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant