fix: East Cambs: Use new API backend by tomsutch · Pull Request #2132 · robbrad/UKBinCollectionData

tomsutch · 2026-06-12T17:55:08Z

New AchieveForms API and 3-step flow necessary after changes to bin collection service in East Cambridgeshire from 1 June 2026.

Inspired by similar fix to mampfes/hacs_waste_collection_schedule (which is also MIT-licensed)

The new system appears not to return any collections for public buildings, such as the district council offices which was the test UPRN. Changed this to the Lamb Hotel, Ely.

Summary by CodeRabbit

Refactor
- Updated East Cambridgeshire Council bin collection data retrieval mechanism.

New AchieveForms API and 3-step flow necessary after changes to bin collection service in East Cambridgeshire from 1 June 2026. Inspired by similar fix to mampfes/hacs_waste_collection_schedule (which is also MIT-licensed) The new system appears not to return any collections for public buildings, such as the district council offices which was the test UPRN. Changed this to the Lamb Hotel, Ely.

coderabbitai · 2026-06-12T17:55:21Z

📝 Walkthrough

Walkthrough

EastCambridgeshireCouncil parser refactored from direct HTML scraping to an AchieveForms/APIBroker-based workflow. Session priming, token retrieval, and XML parsing replace prior HTML parsing logic. Test UPRN data updated with corresponding formatting adjustment.

Changes

East Cambridgeshire Council Parser Refactor

Layer / File(s)	Summary
Parse data refactor with AchieveForms workflow `uk_bin_collection/uk_bin_collection/councils/EastCambridgeshireCouncil.py`	`parse_data` method refactored from HTML scraping to three-step AchieveForms/APIBroker flow: session priming against forms endpoint, auth token extraction and retrieval, collections lookup using uprn and 42-day date window, XML response parsing of Row elements to extract bin type and collection date, with updated date formatting and row validation.
Test data UPRN update `uk_bin_collection/tests/input.json`	EastCambridgeshireCouncil test uprn value replaced; end-of-file formatting adjusted.

Possibly related PRs

robbrad/UKBinCollectionData#2110: Both PRs refactor EastCambridgeshireCouncil.py parse_data method, including session priming and request workflow changes.

Suggested reviewers

dp247

Poem

🐰 A council bin parser takes flight,
From HTML scraping to APIs bright,
AchieveForms dances with tokens in tow,
XML rows bloom in the API flow,
Forty-two days of collections aligned! 📦

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and accurately describes the main change: updating East Cambridgeshire Council's bin collection checker to use a new API backend, which is the primary refactoring in the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tomsutch · 2026-06-12T17:57:12Z

Fixes #2117

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/EastCambridgeshireCouncil.py`:
- Around line 43-44: The code extracts a session id via sid_match =
re.search(...) and then immediately calls sid_match.group(1) which will raise
AttributeError if the regex fails; update the logic in the method in
EastCambridgeshireCouncil (where r is the priming response) to check if
sid_match is None and raise a clear exception (e.g., ValueError or RuntimeError)
with a descriptive message like "Failed to find auth-session in priming
response" (include response.status_code or snippet of r.text for debugging),
otherwise set sid = sid_match.group(1) as before.
- Around line 61-69: The current extraction of auth_token from r_auth.json()
silently returns an empty string if any nested key is missing (auth_data ->
integration -> transformed -> rows_data -> "0" -> AuthenticateResponse), so add
an explicit validation after parsing auth_data: attempt to extract auth_token
from auth_data (using the same nested keys or safer index checks), and if
auth_token is missing or falsy raise a clear exception (ValueError or a custom
error) with a message that includes the received auth_data (or enough context)
to make debugging easier; reference the variables r_auth, auth_data, and
auth_token when adding the check and raising the error so the failure is
explicit rather than allowing downstream cryptic errors.
- Around line 98-101: The code reads r_col.json() and directly indexes
col_data['data'] without checking the HTTP response or JSON contents; add
r_col.raise_for_status() immediately after the request (like Step 2 does), then
parse JSON inside a try/except catching JSONDecodeError and KeyError around
col_data = r_col.json(); if the 'data' key is missing log or raise a descriptive
error mentioning r_col.status_code and the response body, and only then pass
col_data['data'] into BeautifulSoup so failures are surfaced with context.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8ec0443f-a434-4075-a836-d13843471d99

📥 Commits

Reviewing files that changed from the base of the PR and between b65502c and 2906f82.

📒 Files selected for processing (2)

uk_bin_collection/tests/input.json
uk_bin_collection/uk_bin_collection/councils/EastCambridgeshireCouncil.py

coderabbitai · 2026-06-12T17:58:57Z

+        sid_match = re.search(r'"auth-session":"([^"]+)"', r.text)
+        sid = sid_match.group(1)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing null check on regex match causes AttributeError on unexpected response.

If the session priming response doesn't contain the expected auth-session pattern (e.g., API format changes, network error returns HTML error page), sid_match will be None and .group(1) raises AttributeError. Per project conventions, prefer explicit failures with clear error messages.

Proposed fix

sid_match = re.search(r'"auth-session":"([^"]+)"', r.text) + if not sid_match: + raise ValueError("Failed to extract auth-session from AchieveForms response") sid = sid_match.group(1)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

sid_match = re.search(r'"auth-session":"([^"]+)"', r.text)

sid = sid_match.group(1)

sid_match = re.search(r'"auth-session":"([^"]+)"', r.text)

if not sid_match:

raise ValueError("Failed to extract auth-session from AchieveForms response")

sid = sid_match.group(1)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/EastCambridgeshireCouncil.py` around lines 43 - 44, The code extracts a session id via sid_match = re.search(...) and then immediately calls sid_match.group(1) which will raise AttributeError if the regex fails; update the logic in the method in EastCambridgeshireCouncil (where r is the priming response) to check if sid_match is None and raise a clear exception (e.g., ValueError or RuntimeError) with a descriptive message like "Failed to find auth-session in priming response" (include response.status_code or snippet of r.text for debugging), otherwise set sid = sid_match.group(1) as before.

Source: Learnings

coderabbitai · 2026-06-12T17:58:57Z

+        r_auth.raise_for_status()
+        auth_data = r_auth.json()
+        auth_token = (
+            auth_data.get("integration", {})
+            .get("transformed", {})
+            .get("rows_data", {})
+            .get("0", {})
+            .get("AuthenticateResponse", "")
+        )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Silent failure when auth token extraction fails.

The chained .get() calls return an empty string if any key is missing, allowing the flow to continue with an invalid/empty token. This will likely cause a cryptic failure downstream in Step 3. Per project conventions, raise an explicit exception when the expected response structure is missing.

Proposed fix

r_auth.raise_for_status() auth_data = r_auth.json() auth_token = ( auth_data.get("integration", {}) .get("transformed", {}) .get("rows_data", {}) .get("0", {}) .get("AuthenticateResponse", "") ) + if not auth_token: + raise ValueError("Failed to extract AuthenticateResponse token from API response")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/EastCambridgeshireCouncil.py` around lines 61 - 69, The current extraction of auth_token from r_auth.json() silently returns an empty string if any nested key is missing (auth_data -> integration -> transformed -> rows_data -> "0" -> AuthenticateResponse), so add an explicit validation after parsing auth_data: attempt to extract auth_token from auth_data (using the same nested keys or safer index checks), and if auth_token is missing or falsy raise a clear exception (ValueError or a custom error) with a message that includes the received auth_data (or enough context) to make debugging easier; reference the variables r_auth, auth_data, and auth_token when adding the check and raising the error so the failure is explicit rather than allowing downstream cryptic errors.

Source: Learnings

coderabbitai · 2026-06-12T17:58:57Z

+        )
+
+        col_data = r_col.json()
+        soup = BeautifulSoup(col_data['data'], features="xml")


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing raise_for_status() and error handling for collections response.

Step 3 (r_col) doesn't call raise_for_status() unlike Step 2, and directly accesses col_data['data'] which will raise a KeyError with no context if the key is missing. Add consistent error handling.

Proposed fix

timeout=30, ) - + r_col.raise_for_status() col_data = r_col.json() + if "data" not in col_data: + raise ValueError("Collections response missing 'data' field") soup = BeautifulSoup(col_data['data'], features="xml")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

)

col_data = r_col.json()

soup = BeautifulSoup(col_data['data'], features="xml")

)

r_col.raise_for_status()

col_data = r_col.json()

if "data" not in col_data:

raise ValueError("Collections response missing 'data' field")

soup = BeautifulSoup(col_data['data'], features="xml")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/EastCambridgeshireCouncil.py` around lines 98 - 101, The code reads r_col.json() and directly indexes col_data['data'] without checking the HTTP response or JSON contents; add r_col.raise_for_status() immediately after the request (like Step 2 does), then parse JSON inside a try/except catching JSONDecodeError and KeyError around col_data = r_col.json(); if the 'data' key is missing log or raise a descriptive error mentioning r_col.status_code and the response body, and only then pass col_data['data'] into BeautifulSoup so failures are surfaced with context.

coderabbitai Bot reviewed Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: East Cambs: Use new API backend#2132

fix: East Cambs: Use new API backend#2132
tomsutch wants to merge 1 commit into
robbrad:masterfrom
tomsutch:fix/east-cambs-new-checker

tomsutch commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

tomsutch commented Jun 12, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 12, 2026

Uh oh!

coderabbitai Bot Jun 12, 2026

Uh oh!

coderabbitai Bot Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		sid_match = re.search(r'"auth-session":"([^"]+)"', r.text)
		sid = sid_match.group(1)

Conversation

tomsutch commented Jun 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

tomsutch commented Jun 12, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tomsutch commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading