fix: BaberghDistrictCouncil - support new collection-day table format#2122
fix: BaberghDistrictCouncil - support new collection-day table format#2122pacso wants to merge 1 commit into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughBabergh scraper now uses a module-level Liferay portlet namespace for element IDs, waits for and selects namespaced address dropdown options by postcode/paon patterns, submits the namespaced form, and parses collection days from a results table using full-weekday date parsing. ChangesBabergh Portlet Integration and Parsing Overhaul
Sequence Diagram(s)sequenceDiagram
participant Client_parse_data as parse_data
participant Browser_WebDriver as WebDriver
participant Address_Dropdown as AddressDropdown
participant Results_Page as ResultsTable
parse_data->>WebDriver: enter `{PORTLET_NS}postcode` and click `{PORTLET_NS}btnAddressLookup`
WebDriver->>AddressDropdown: wait for `{PORTLET_NS}uprn` options to populate
parse_data->>AddressDropdown: select best-matching option (postcode + paon regex)
parse_data->>WebDriver: click `{PORTLET_NS}fcd_submit`
WebDriver->>ResultsTable: wait for `div.collection-days-page table`
ResultsTable->>parse_data: return rows of (collection type, date cells) parsed with `%A %d %b %Y`
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/BaberghDistrictCouncil.py (2)
1-1: ⚡ Quick winRemove unused
timeimport.The
timemodule is imported but never used in this file. All waits are handled by Selenium'sWebDriverWait.♻️ Proposed fix
-import time from datetime import datetime🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/BaberghDistrictCouncil.py` at line 1, Remove the unused import of the time module in the BaberghDistrictCouncil.py file: locate the top-level import statement "import time" and delete it since no functions or methods (e.g., any class or function definitions within BaberghDistrictCouncil.py) reference time and all waits use Selenium's WebDriverWait; run linting/tests to confirm no further references remain.
119-154: ⚡ Quick winConsider adding logging for skipped table rows and cells.
The parser silently skips rows/cells that don't meet expected criteria (lines 127, 131, 143) using
continuestatements. While this defensive approach prevents crashes, it could hide site format changes without alerting developers or users.Based on learnings, council parsers consumed by Home Assistant should prefer defensive fallbacks with
logger.warningover raising exceptions, ensuring resilience while still surfacing unexpected markup changes during debugging.♻️ Proposed improvement with logging
+import logging + +logger = logging.getLogger(__name__) + # ... (in parse_data method) results_page = soup.find("div", class_="collection-days-page") if results_page: table = results_page.find("table") if table: body = table.find("tbody") or table for row in body.find_all("tr"): cells = row.find_all("td") if len(cells) < 2: + logger.warning(f"Skipping table row with fewer than 2 cells: {row}") continue collection_type = cells[0].get_text(strip=True) if not collection_type: + logger.warning(f"Skipping row with empty collection type: {row}") continue # ... date extraction ... for date_cell in date_cells: date_text = date_cell.get_text(strip=True) if not date_text: + logger.warning(f"Skipping empty date cell in {collection_type} row") continue🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@uk_bin_collection/uk_bin_collection/councils/BaberghDistrictCouncil.py` around lines 119 - 154, The parser in BaberghDistrictCouncil (the loop that iterates rows in the results_page table) silently skips malformed rows/cells at the three continue points; add logger.warning calls instead of silent continues so unexpected markup changes are surfaced. Specifically, in the loop inside the method that builds data["bins"] (the for row in body.find_all("tr") loop), replace the silent continues after: a) if len(cells) < 2, b) if not collection_type, and c) if not date_text with logger.warning messages that include identifying context (e.g., number of cells, row HTML snippet or an index, the raw cells content, and the problematic date_text) so operators can see when rows are skipped but maintain the same continue behavior after logging. Ensure you use the module logger instance used elsewhere in the file (logger) and keep messages concise and consistent.Source: Learnings
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/BaberghDistrictCouncil.py`:
- Line 89: The current string match in option_text that appends f",
{paon_str}A," only handles an "A" suffix and should be fixed; update the
construction that uses paon_str and option_text (the spot building the ",
{paon_str}A," fragment) to either remove this special-case fragment entirely or
generalize it to match any single-letter suffix (e.g., replace the literal "A"
with a character class like [A-Za-z] or [A-Z] in the pattern used), and ensure
the matching code treats letters consistently (case if relevant) so addresses
like "1B", "1C", etc. will match when paon="1".
---
Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/BaberghDistrictCouncil.py`:
- Line 1: Remove the unused import of the time module in the
BaberghDistrictCouncil.py file: locate the top-level import statement "import
time" and delete it since no functions or methods (e.g., any class or function
definitions within BaberghDistrictCouncil.py) reference time and all waits use
Selenium's WebDriverWait; run linting/tests to confirm no further references
remain.
- Around line 119-154: The parser in BaberghDistrictCouncil (the loop that
iterates rows in the results_page table) silently skips malformed rows/cells at
the three continue points; add logger.warning calls instead of silent continues
so unexpected markup changes are surfaced. Specifically, in the loop inside the
method that builds data["bins"] (the for row in body.find_all("tr") loop),
replace the silent continues after: a) if len(cells) < 2, b) if not
collection_type, and c) if not date_text with logger.warning messages that
include identifying context (e.g., number of cells, row HTML snippet or an
index, the raw cells content, and the problematic date_text) so operators can
see when rows are skipped but maintain the same continue behavior after logging.
Ensure you use the module logger instance used elsewhere in the file (logger)
and keep messages concise and consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e7951fdc-e9ab-4ece-bd80-c2ed3c7f4f86
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/BaberghDistrictCouncil.py
Address review feedback on PR robbrad#2122: - Replace the brittle, "A"-only house-number suffix special case with a regex that matches the house name/number as a whole token with an optional single-letter suffix (e.g. "1A", "1B") and either a space or comma separator. - Remove the now-unused `import time` (all waits use WebDriverWait). Co-Authored-By: Claude Opus 4.8 <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2122 +/- ##
=======================================
Coverage 86.67% 86.67%
=======================================
Files 9 9
Lines 1141 1141
=======================================
Hits 989 989
Misses 152 152 ☔ View full report in Codecov by Harness. |
The Babergh collection-day finder page changed significantly: - Form fields now use Liferay portlet ids instead of aria labels - A new "Find collection days" submit step is required after selecting an address - Results moved from card markup to a table in div.collection-days-page - Dates are now formatted like "Monday 08 Jun 2026" Rewrite the parser to drive the new flow and read the results table row-by-row, which captures all bin types including the newly added ones. Match the address by whole-token house name/number with an optional single-letter suffix, and fail loud if the results table yields no collections (an empty parse means the format changed rather than there being no upcoming collections). Verified against the live site; output validates against the schema and the integration test passes. Co-Authored-By: Claude Opus 4.8 <[email protected]>
27a3be4 to
0261c41
Compare
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
@coderabbitai resume |
✅ Action performedReviews resumed. |
|
@coderabbitai review |
✅ Action performedReview finished.
|
Summary
The Babergh District Council collection-day finder page changed significantly, breaking the existing parser (the integration test fails with a
TimeoutExceptionon the very first step). This PR rewrites the parser to handle the new page.What changed on the council site
..._postcode,..._btnAddressLookup,..._uprn) instead ofaria-labelattributes /lfr-btn-labelclass...._fcd_submit); previously the results loaded automatically.div.collection-cards→div.card→<h3>/<p>markup to a<table class="table">insidediv.collection-days-page.Mon 27 Apr 2026(%a %d %b %Y) toMonday 08 Jun 2026(%A %d %b %Y).What this PR does
Testing
Verified against the live site with Selenium + headless Chrome:
TimeoutException(test fails).BaberghDistrictCouncilintegration test passes; output validates againstoutput.schema.Returns 12 collections across 5 types for the test address (Little Changes, CO6 4RA): Food Waste, Paper And Card, Recycling, Garden Waste (Brown Bin), Refuse (General Rubbish). Multiple future dates per type are fine — the Home Assistant coordinator already collapses them to the earliest future date per type.
🤖 Generated with Claude Code
Summary by CodeRabbit