Skip to content

Set validation error times to current time to avoid race condition#1637

Open
landonshumway-ia wants to merge 2 commits into
csg-org:mainfrom
InspiringApps:fix/validation-error-reporting-race-condition
Open

Set validation error times to current time to avoid race condition#1637
landonshumway-ia wants to merge 2 commits into
csg-org:mainfrom
InspiringApps:fix/validation-error-reporting-race-condition

Conversation

@landonshumway-ia

@landonshumway-ia landonshumway-ia commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

There is a race condition that sometimes occurs when a low volume of invalid licenses are uploaded in a CSV file. We have a error email reporter that sends out license upload failure emails every 15 minutes, and this reporting process looks at the last 15 minutes of events in the DB to see if any upload errors occurred. While beta testing, a state uploaded a single invalid license record. The upload failure event was placed on the data event bus which placed a message on a SQS queue for processing. That queue has a batch size of 10 and it will wait for up to 5 minutes to process messages if the batch size is not met. The one upload failure notification was sitting in the queue long enough that by the time it was stored in the DB, the notification error reporter had already completed its scan for the 15 minutes and never detected the error. This race condition exists in all three compact codebases (JCC, Cosm, and SW). This change reduces the batch window for the data event processor queue from 5 minutes to 5 seconds to address this.

This can also occur if there is a large CSV upload and all of the errors are toward the end of the file. Previously, we were setting the same time stamp for all row failures, so if a large upload took 15 minutes to process, and the errors were toward the end of the file, they wouldn't be stored in the database until after the ingest error reporter had scanned the 15 minute window, so the errors wouldn't be reported. This updates the timestamp to use the actual time that the row was processed and the error was detected.

Testing List

  • yarn test:unit:all should run without errors or warnings
  • yarn serve should run without errors or warnings
  • yarn build should run without errors or warnings
  • For API configuration changes: CDK tests added/updated in backend/compact-connect/tests/unit/test_api.py
  • For API endpoint changes: OpenAPI spec updated to show latest endpoint configuration run compact-connect/bin/download_oas30.py
  • Code review

Closes #1633

There is a race condition that sometimes occurs when a low volume of invalid
licenses are uploaded. We have a error email reporter that sends out license
upload failure emails every 15 minutes, and this reporting process looks at the
last 15 minutes of events in the DB to see if any upload errors occurred. While
beta testing, a state upload a single invalid license record. The upload failure
event was placed on the data event bus which placed a message on a SQS queue for
processing. That queue has a batch size of 10 and it will wait for up to 5 minutes
to process messages if the batch size is not met. The one upload failure
notification was sitting in the queue long enough that by the time it was stored in
the DB, the notification error reporter had already completed its scan for the 15
minutes and never detected the error. This race condition exists in all three
compact codebases (JCC, Cosm, and SW)

This can also occur if there is a large CSV upload and all of the errors are
toward the end of the file. Previously, we were setting the same time stamp
for all row failures. This updates the timestamp to use the actual time that
the row was processed and the error was detected.
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Warning

Review limit reached

@landonshumway-ia, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 31 minutes and 41 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f134e0b8-e9c3-46d4-9e93-4a41e889e713

📥 Commits

Reviewing files that changed from the base of the PR and between 625029d and de4f383.

📒 Files selected for processing (14)
  • backend/compact-connect/lambdas/python/common/tests/unit/test_utils.py
  • backend/compact-connect/lambdas/python/provider-data-v1/handlers/bulk_upload.py
  • backend/compact-connect/lambdas/python/provider-data-v1/tests/function/test_handlers/test_bulk_upload.py
  • backend/compact-connect/lambdas/python/provider-data-v1/tests/unit/test_handlers/test_bulk_upload_unit.py
  • backend/compact-connect/stacks/persistent_stack/data_event_table.py
  • backend/cosmetology-app/lambdas/python/common/tests/unit/test_utils.py
  • backend/cosmetology-app/lambdas/python/provider-data-v1/handlers/bulk_upload.py
  • backend/cosmetology-app/lambdas/python/provider-data-v1/tests/function/test_handlers/test_bulk_upload.py
  • backend/cosmetology-app/lambdas/python/provider-data-v1/tests/unit/test_handlers/test_bulk_upload_unit.py
  • backend/cosmetology-app/stacks/persistent_stack/data_event_table.py
  • backend/social-work-app/lambdas/python/common/tests/unit/test_utils.py
  • backend/social-work-app/lambdas/python/provider-data-v1/handlers/bulk_upload.py
  • backend/social-work-app/lambdas/python/provider-data-v1/tests/function/test_handlers/test_bulk_upload.py
  • backend/social-work-app/stacks/persistent_stack/data_event_table.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ChiefStief ChiefStief left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@landonshumway-ia landonshumway-ia changed the title Set validation error times to current time Set validation error times to current time to avoid race condition Jun 10, 2026

@jlkravitz jlkravitz left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question, but otherwise- great catch!

Comment thread backend/social-work-app/stacks/persistent_stack/data_event_table.py

@jlkravitz jlkravitz left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isabeleliassen Good to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Address race condition for license upload failure email notifications

3 participants