Skip to content

Request Handling in uscensus_pep_sex_race#2019

Open
niveditasing wants to merge 6 commits into
datacommonsorg:masterfrom
niveditasing:added_firewall_logic
Open

Request Handling in uscensus_pep_sex_race#2019
niveditasing wants to merge 6 commits into
datacommonsorg:masterfrom
niveditasing:added_firewall_logic

Conversation

@niveditasing
Copy link
Copy Markdown
Contributor

@niveditasing niveditasing commented May 20, 2026

Added error handling to the file loop so a single bad file logs an error and skips instead of crashing the script.

Changed schema mismatches to skip (continue) instead of throwing an error, keeping the script running for valid files.

Added .astype(str) before fixing commas in numeric columns to prevent pandas formatting errors.

Added protections around integer conversions to safely handle malformed or non-numeric data.

Cleaned up readability by fixing indentation and removing extra blank lines.

Differ has 2 deletions for file national_before_2000.csv for year 1918 due to urls not working

PR checlist :https://docs.google.com/spreadsheets/d/1BzweR9Sj58j0H2_BweGTmfE4Z1lrjPZL8u1FS0kzCeg/edit?pli=1&gid=0#gid=0

@niveditasing niveditasing changed the title Added a browser-like User-Agent header to the requests.head call in the add_future_year_urls function In uscensus_pep_sex_race Request Handling in uscensus_pep_sex_race May 20, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the US Census PEP processing script to include per-file error handling and ensures string operations are applied correctly to data columns. However, the reviewer noted that the changes do not align with the PR description's mention of the requests library and list-based data collection. Further feedback identifies inefficiencies such as to_csv and pd.concat being called within the loop, the use of a bare except block, and the potential for silent failures by replacing exceptions with continue. Additionally, the reviewer pointed out that final_df columns are converted to strings without being cast back to integers.

Comment thread scripts/us_census/pep/us_pep_sexrace/national/national_1900_1970.py
Comment thread scripts/us_census/pep/us_pep_sexrace/national/national_1900_1970.py
Comment thread scripts/us_census/pep/us_pep_sexrace/national/national_1900_1970.py
Comment thread scripts/us_census/pep/us_pep_sexrace/national/national_1900_1970.py Outdated
Comment thread scripts/us_census/pep/us_pep_sexrace/national/national_1900_1970.py Outdated
Comment thread scripts/us_census/pep/us_pep_sexrace/national/national_1900_1970.py Outdated
@niveditasing
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces error handling and data type safety to the US Census PEP national data processing script by wrapping file processing in try-except blocks and ensuring string casting before character replacement. Feedback focuses on improving the robustness and efficiency of the script, specifically by addressing hardcoded filenames that cause data overwrites in loops, moving sorting operations outside of loops, and using more specific exception handling. There are also suggestions to maintain consistency in DataFrame concatenation and to ensure numeric type conversion is applied consistently across all processed DataFrames.

Comment thread scripts/us_census/pep/us_pep_sexrace/national/national_1900_1970.py
Comment thread scripts/us_census/pep/us_pep_sexrace/national/national_1900_1970.py Outdated
Comment thread scripts/us_census/pep/us_pep_sexrace/national/national_1900_1970.py
Comment thread scripts/us_census/pep/us_pep_sexrace/national/national_1900_1970.py Outdated
Comment thread scripts/us_census/pep/us_pep_sexrace/national/national_1900_1970.py Outdated
Comment thread scripts/us_census/pep/us_pep_sexrace/national/national_1900_1970.py Outdated
@niveditasing
Copy link
Copy Markdown
Contributor Author

@gemini-code-assist review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the robustness of the US Census PEP data processing script by adding error handling for file reading, refining schema validation to skip problematic files instead of crashing, and updating output file naming to include specific years. It also refactors dataframe concatenation and sorting logic. Feedback identifies a potential bug in the type conversion logic where NaN values converted to strings would cause integer casting to fail, suggesting the use of 'pd.to_numeric' with 'errors="coerce"' instead. Additionally, it is recommended to use the 'thousands' parameter in 'pd.read_csv' to handle comma separators more efficiently.

Comment thread scripts/us_census/pep/us_pep_sexrace/national/national_1900_1970.py
Comment thread scripts/us_census/pep/us_pep_sexrace/national/national_1900_1970.py
Comment thread scripts/us_census/pep/us_pep_sexrace/national/national_1900_1970.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant