Skip to content

fix(scraper): RBI schema resilience + auto-sync banknames.json across…#465

Open
PriyankaMarbill wants to merge 1 commit into
masterfrom
fix/ISS-1697801-bank-name-rbi-schema-sync
Open

fix(scraper): RBI schema resilience + auto-sync banknames.json across…#465
PriyankaMarbill wants to merge 1 commit into
masterfrom
fix/ISS-1697801-bank-name-rbi-schema-sync

Conversation

@PriyankaMarbill

Copy link
Copy Markdown
Contributor

… SDKs [ISS-1697801]

Three coordinated changes that together solve the first issue on ISS-1697801 ("Bank Name Sync Issue / RBI Schema Change").

  1. Read the bank-name column tolerantly. RBI periodically renames the bank-name header in the NEFT/RTGS sheets. The scraper used to read row['BANK'] directly, so a rename silently set every bank name to nil. parse_csv now reads via read_bank_name(), which tries a list of known header variants (BANK, BANK NAME, BANK_NAME, Bank Name, ...) and normalises the value back into row['BANK']. A WARN fires if none of the variants match, so the next rename is caught immediately instead of failing silently.

  2. Never drop a bank just because banknames.json is stale. merge_dataset used to overwrite combined_data['BANK'] with the value returned by bank_name_from_code (which reads only our local banknames.json), so any bank RBI added that we had not registered yet ended up with no name and effectively disappeared. It now falls back to the value we captured from the RBI sheet when the local lookup is empty, and surfaces the new bank via a WARN.

  3. Auto-propagate new banks to every language SDK.

    • After the export, generate.rb calls sync_banknames!(dataset). For every IFSC, it derives the 4-char bank code and appends it to src/banknames.json if missing, using the name from the RBI sheet.
    • The name is passed through normalize_bank_name() which applies the CONTRIBUTING.md "Bank Names Guidelines": drop trailing 'Ltd'/'Limited' (rule 1), drop leading 'The ' (rule 3), canonicalise to 'Co-operative' with no trailing period (rules 4 + 9), 'sahkari' -> 'Sahakari' (rule 11), and collapse SHOUTY-CASE to Title Case while preserving short acronyms (rule 2). Rules that need human judgement (city-in-brackets, Grameen/Gramin spelling, unexpanded abbreviations) are flagged via a WARN that points at CONTRIBUTING.md.
    • When any new bank is added, generate.rb invokes make generate-constants, which already regenerates bank.rb, Bank.php, bank.js and constants.go from banknames.json via the Go template generator. End result: a single scraper run picks up a new bank from RBI, registers it in the source of truth, and updates all four SDK files in lockstep — no manual make step required.
    • A new make check-constants target regenerates the four files and fails with a diff if they drift from banknames.json. Intended to run in CI so that hand-edits to banknames.json without a regen, or hand-edits to a generated file, fail loudly.

Verification done locally:

  • ruby -c on methods.rb and generate.rb
  • Dry-run of sync_banknames! on a synthetic dataset (existing bank, brand-new bank, bank with nil name) -> correctly added the new bank, skipped the nil one, kept banknames.json sorted with 2-space indent
  • 14-case unit run of normalize_bank_name covering all CONTRIBUTING.md guidelines (rules 1, 2, 3, 4, 9, 11, acronym preservation, nil/empty input) -> all pass
  • make -n check-constants expands to the expected go run invocations plus the git-diff guard
  • End-to-end Go generator run requires Go, which is not available in this sandbox; will run in CI / on the maintainer's machine

Refs: ISS-1697801
Compliance: DOCUMENTATION.md (release rules), CONTRIBUTING.md (bank names

  • code style + build-must-pass)

Note :- Please follow the below points while attaching test cases document link below:

- If label Tested is added then test cases document URL is mandatory.

- Link added should be a valid URL and accessible throughout the org.

- If the branch name contains hotfix / revert by default the BVT workflow check will pass.

Test Case Document URL
Please paste test case document link here....

… SDKs [ISS-1697801]

Three coordinated changes that together solve the first issue on
ISS-1697801 ("Bank Name Sync Issue / RBI Schema Change").

1. Read the bank-name column tolerantly. RBI periodically renames the
   bank-name header in the NEFT/RTGS sheets. The scraper used to read
   row['BANK'] directly, so a rename silently set every bank name to
   nil. parse_csv now reads via read_bank_name(), which tries a list of
   known header variants (BANK, BANK NAME, BANK_NAME, Bank Name, ...)
   and normalises the value back into row['BANK']. A WARN fires if none
   of the variants match, so the next rename is caught immediately
   instead of failing silently.

2. Never drop a bank just because banknames.json is stale. merge_dataset
   used to overwrite combined_data['BANK'] with the value returned by
   bank_name_from_code (which reads only our local banknames.json), so
   any bank RBI added that we had not registered yet ended up with no
   name and effectively disappeared. It now falls back to the value we
   captured from the RBI sheet when the local lookup is empty, and
   surfaces the new bank via a WARN.

3. Auto-propagate new banks to every language SDK.
   - After the export, generate.rb calls sync_banknames!(dataset). For
     every IFSC, it derives the 4-char bank code and appends it to
     src/banknames.json if missing, using the name from the RBI sheet.
   - The name is passed through normalize_bank_name() which applies the
     CONTRIBUTING.md "Bank Names Guidelines": drop trailing 'Ltd'/'Limited'
     (rule 1), drop leading 'The ' (rule 3), canonicalise to
     'Co-operative' with no trailing period (rules 4 + 9), 'sahkari' ->
     'Sahakari' (rule 11), and collapse SHOUTY-CASE to Title Case while
     preserving short acronyms (rule 2). Rules that need human judgement
     (city-in-brackets, Grameen/Gramin spelling, unexpanded abbreviations)
     are flagged via a WARN that points at CONTRIBUTING.md.
   - When any new bank is added, generate.rb invokes
     `make generate-constants`, which already regenerates bank.rb,
     Bank.php, bank.js and constants.go from banknames.json via the Go
     template generator. End result: a single scraper run picks up a new
     bank from RBI, registers it in the source of truth, and updates all
     four SDK files in lockstep — no manual `make` step required.
   - A new `make check-constants` target regenerates the four files and
     fails with a diff if they drift from banknames.json. Intended to
     run in CI so that hand-edits to banknames.json without a regen, or
     hand-edits to a generated file, fail loudly.

Verification done locally:
- ruby -c on methods.rb and generate.rb
- Dry-run of sync_banknames! on a synthetic dataset (existing bank,
  brand-new bank, bank with nil name) -> correctly added the new bank,
  skipped the nil one, kept banknames.json sorted with 2-space indent
- 14-case unit run of normalize_bank_name covering all CONTRIBUTING.md
  guidelines (rules 1, 2, 3, 4, 9, 11, acronym preservation, nil/empty
  input) -> all pass
- `make -n check-constants` expands to the expected `go run` invocations
  plus the git-diff guard
- End-to-end Go generator run requires Go, which is not available in
  this sandbox; will run in CI / on the maintainer's machine

Refs: ISS-1697801
Compliance: DOCUMENTATION.md (release rules), CONTRIBUTING.md (bank names
+ code style + build-must-pass)

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@PriyankaMarbill PriyankaMarbill added the TestingNotRequired TestingNotRequired label for BVT label May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

TestingNotRequired TestingNotRequired label for BVT

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant