Request Handling in uscensus_pep_sex_race by niveditasing · Pull Request #2019 · datacommonsorg/data

niveditasing · 2026-05-20T11:09:06Z

Added error handling to the file loop so a single bad file logs an error and skips instead of crashing the script.

Changed schema mismatches to skip (continue) instead of throwing an error, keeping the script running for valid files.

Added .astype(str) before fixing commas in numeric columns to prevent pandas formatting errors.

Added protections around integer conversions to safely handle malformed or non-numeric data.

Cleaned up readability by fixing indentation and removing extra blank lines.

Differ has 2 deletions for file national_before_2000.csv for year 1918 due to urls not working

PR checlist :https://docs.google.com/spreadsheets/d/1BzweR9Sj58j0H2_BweGTmfE4Z1lrjPZL8u1FS0kzCeg/edit?pli=1&gid=0#gid=0

gemini-code-assist

Code Review

This pull request updates the US Census PEP processing script to include per-file error handling and ensures string operations are applied correctly to data columns. However, the reviewer noted that the changes do not align with the PR description's mention of the requests library and list-based data collection. Further feedback identifies inefficiencies such as to_csv and pd.concat being called within the loop, the use of a bare except block, and the potential for silent failures by replacing exceptions with continue. Additionally, the reviewer pointed out that final_df columns are converted to strings without being cast back to integers.

niveditasing · 2026-05-20T11:58:44Z

@gemini-code-assist review

gemini-code-assist

Code Review

This pull request introduces error handling and data type safety to the US Census PEP national data processing script by wrapping file processing in try-except blocks and ensuring string casting before character replacement. Feedback focuses on improving the robustness and efficiency of the script, specifically by addressing hardcoded filenames that cause data overwrites in loops, moving sorting operations outside of loops, and using more specific exception handling. There are also suggestions to maintain consistency in DataFrame concatenation and to ensure numeric type conversion is applied consistently across all processed DataFrames.

niveditasing · 2026-05-20T12:26:03Z

@gemini-code-assist review

gemini-code-assist

Code Review

This pull request improves the robustness of the US Census PEP data processing script by adding error handling for file reading, refining schema validation to skip problematic files instead of crashing, and updating output file naming to include specific years. It also refactors dataframe concatenation and sorting logic. Feedback identifies a potential bug in the type conversion logic where NaN values converted to strings would cause integer casting to fail, suggesting the use of 'pd.to_numeric' with 'errors="coerce"' instead. Additionally, it is recommended to use the 'thousands' parameter in 'pd.read_csv' to handle comma separators more efficiently.

…970.py

niveditasing added 2 commits May 20, 2026 11:05

code fix

041e4c2

Merge branch 'master' into added_firewall_logic

87349d4

niveditasing changed the title ~~Added a browser-like User-Agent header to the requests.head call in the add_future_year_urls function In uscensus_pep_sex_race~~ Request Handling in uscensus_pep_sex_race May 20, 2026

gemini-code-assist Bot reviewed May 20, 2026

View reviewed changes

code fix

1f2b83a

gemini-code-assist Bot reviewed May 20, 2026

View reviewed changes

Refactor national_1900_1970.py based on code assist recommendations

93fb52c

gemini-code-assist Bot reviewed May 20, 2026

View reviewed changes

Comment thread scripts/us_census/pep/us_pep_sexrace/national/national_1900_1970.py

Comment thread scripts/us_census/pep/us_pep_sexrace/national/national_1900_1970.py

Comment thread scripts/us_census/pep/us_pep_sexrace/national/national_1900_1970.py Outdated

niveditasing added 2 commits May 20, 2026 12:36

Use pd.to_numeric for more robust numeric cleaning in national_1900_1…

ad222d7

…970.py

code fix

f6652dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request Handling in uscensus_pep_sex_race#2019

Request Handling in uscensus_pep_sex_race#2019
niveditasing wants to merge 6 commits into
datacommonsorg:masterfrom
niveditasing:added_firewall_logic

niveditasing commented May 20, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

niveditasing commented May 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

niveditasing commented May 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

niveditasing commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

niveditasing commented May 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

niveditasing commented May 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

niveditasing commented May 20, 2026 •

edited

Loading