fix: East Cambs: Use new API backend#2132
Conversation
New AchieveForms API and 3-step flow necessary after changes to bin collection service in East Cambridgeshire from 1 June 2026. Inspired by similar fix to mampfes/hacs_waste_collection_schedule (which is also MIT-licensed) The new system appears not to return any collections for public buildings, such as the district council offices which was the test UPRN. Changed this to the Lamb Hotel, Ely.
📝 WalkthroughWalkthroughEastCambridgeshireCouncil parser refactored from direct HTML scraping to an AchieveForms/APIBroker-based workflow. Session priming, token retrieval, and XML parsing replace prior HTML parsing logic. Test UPRN data updated with corresponding formatting adjustment. ChangesEast Cambridgeshire Council Parser Refactor
Possibly related PRs
Suggested reviewers
Poem
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Fixes #2117 |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/EastCambridgeshireCouncil.py`:
- Around line 43-44: The code extracts a session id via sid_match =
re.search(...) and then immediately calls sid_match.group(1) which will raise
AttributeError if the regex fails; update the logic in the method in
EastCambridgeshireCouncil (where r is the priming response) to check if
sid_match is None and raise a clear exception (e.g., ValueError or RuntimeError)
with a descriptive message like "Failed to find auth-session in priming
response" (include response.status_code or snippet of r.text for debugging),
otherwise set sid = sid_match.group(1) as before.
- Around line 61-69: The current extraction of auth_token from r_auth.json()
silently returns an empty string if any nested key is missing (auth_data ->
integration -> transformed -> rows_data -> "0" -> AuthenticateResponse), so add
an explicit validation after parsing auth_data: attempt to extract auth_token
from auth_data (using the same nested keys or safer index checks), and if
auth_token is missing or falsy raise a clear exception (ValueError or a custom
error) with a message that includes the received auth_data (or enough context)
to make debugging easier; reference the variables r_auth, auth_data, and
auth_token when adding the check and raising the error so the failure is
explicit rather than allowing downstream cryptic errors.
- Around line 98-101: The code reads r_col.json() and directly indexes
col_data['data'] without checking the HTTP response or JSON contents; add
r_col.raise_for_status() immediately after the request (like Step 2 does), then
parse JSON inside a try/except catching JSONDecodeError and KeyError around
col_data = r_col.json(); if the 'data' key is missing log or raise a descriptive
error mentioning r_col.status_code and the response body, and only then pass
col_data['data'] into BeautifulSoup so failures are surfaced with context.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8ec0443f-a434-4075-a836-d13843471d99
📒 Files selected for processing (2)
uk_bin_collection/tests/input.jsonuk_bin_collection/uk_bin_collection/councils/EastCambridgeshireCouncil.py
| sid_match = re.search(r'"auth-session":"([^"]+)"', r.text) | ||
| sid = sid_match.group(1) |
There was a problem hiding this comment.
Missing null check on regex match causes AttributeError on unexpected response.
If the session priming response doesn't contain the expected auth-session pattern (e.g., API format changes, network error returns HTML error page), sid_match will be None and .group(1) raises AttributeError. Per project conventions, prefer explicit failures with clear error messages.
Proposed fix
sid_match = re.search(r'"auth-session":"([^"]+)"', r.text)
+ if not sid_match:
+ raise ValueError("Failed to extract auth-session from AchieveForms response")
sid = sid_match.group(1)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| sid_match = re.search(r'"auth-session":"([^"]+)"', r.text) | |
| sid = sid_match.group(1) | |
| sid_match = re.search(r'"auth-session":"([^"]+)"', r.text) | |
| if not sid_match: | |
| raise ValueError("Failed to extract auth-session from AchieveForms response") | |
| sid = sid_match.group(1) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@uk_bin_collection/uk_bin_collection/councils/EastCambridgeshireCouncil.py`
around lines 43 - 44, The code extracts a session id via sid_match =
re.search(...) and then immediately calls sid_match.group(1) which will raise
AttributeError if the regex fails; update the logic in the method in
EastCambridgeshireCouncil (where r is the priming response) to check if
sid_match is None and raise a clear exception (e.g., ValueError or RuntimeError)
with a descriptive message like "Failed to find auth-session in priming
response" (include response.status_code or snippet of r.text for debugging),
otherwise set sid = sid_match.group(1) as before.
Source: Learnings
| r_auth.raise_for_status() | ||
| auth_data = r_auth.json() | ||
| auth_token = ( | ||
| auth_data.get("integration", {}) | ||
| .get("transformed", {}) | ||
| .get("rows_data", {}) | ||
| .get("0", {}) | ||
| .get("AuthenticateResponse", "") | ||
| ) |
There was a problem hiding this comment.
Silent failure when auth token extraction fails.
The chained .get() calls return an empty string if any key is missing, allowing the flow to continue with an invalid/empty token. This will likely cause a cryptic failure downstream in Step 3. Per project conventions, raise an explicit exception when the expected response structure is missing.
Proposed fix
r_auth.raise_for_status()
auth_data = r_auth.json()
auth_token = (
auth_data.get("integration", {})
.get("transformed", {})
.get("rows_data", {})
.get("0", {})
.get("AuthenticateResponse", "")
)
+ if not auth_token:
+ raise ValueError("Failed to extract AuthenticateResponse token from API response")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@uk_bin_collection/uk_bin_collection/councils/EastCambridgeshireCouncil.py`
around lines 61 - 69, The current extraction of auth_token from r_auth.json()
silently returns an empty string if any nested key is missing (auth_data ->
integration -> transformed -> rows_data -> "0" -> AuthenticateResponse), so add
an explicit validation after parsing auth_data: attempt to extract auth_token
from auth_data (using the same nested keys or safer index checks), and if
auth_token is missing or falsy raise a clear exception (ValueError or a custom
error) with a message that includes the received auth_data (or enough context)
to make debugging easier; reference the variables r_auth, auth_data, and
auth_token when adding the check and raising the error so the failure is
explicit rather than allowing downstream cryptic errors.
Source: Learnings
| ) | ||
|
|
||
| col_data = r_col.json() | ||
| soup = BeautifulSoup(col_data['data'], features="xml") |
There was a problem hiding this comment.
Missing raise_for_status() and error handling for collections response.
Step 3 (r_col) doesn't call raise_for_status() unlike Step 2, and directly accesses col_data['data'] which will raise a KeyError with no context if the key is missing. Add consistent error handling.
Proposed fix
timeout=30,
)
-
+ r_col.raise_for_status()
col_data = r_col.json()
+ if "data" not in col_data:
+ raise ValueError("Collections response missing 'data' field")
soup = BeautifulSoup(col_data['data'], features="xml")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ) | |
| col_data = r_col.json() | |
| soup = BeautifulSoup(col_data['data'], features="xml") | |
| ) | |
| r_col.raise_for_status() | |
| col_data = r_col.json() | |
| if "data" not in col_data: | |
| raise ValueError("Collections response missing 'data' field") | |
| soup = BeautifulSoup(col_data['data'], features="xml") |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@uk_bin_collection/uk_bin_collection/councils/EastCambridgeshireCouncil.py`
around lines 98 - 101, The code reads r_col.json() and directly indexes
col_data['data'] without checking the HTTP response or JSON contents; add
r_col.raise_for_status() immediately after the request (like Step 2 does), then
parse JSON inside a try/except catching JSONDecodeError and KeyError around
col_data = r_col.json(); if the 'data' key is missing log or raise a descriptive
error mentioning r_col.status_code and the response body, and only then pass
col_data['data'] into BeautifulSoup so failures are surfaced with context.
New AchieveForms API and 3-step flow necessary after changes to bin collection service in East Cambridgeshire from 1 June 2026.
Inspired by similar fix to mampfes/hacs_waste_collection_schedule (which is also MIT-licensed)
The new system appears not to return any collections for public buildings, such as the district council offices which was the test UPRN. Changed this to the Lamb Hotel, Ely.
Summary by CodeRabbit