Set validation error times to current time to avoid race condition#1637
Set validation error times to current time to avoid race condition#1637landonshumway-ia wants to merge 2 commits into
Conversation
There is a race condition that sometimes occurs when a low volume of invalid licenses are uploaded. We have a error email reporter that sends out license upload failure emails every 15 minutes, and this reporting process looks at the last 15 minutes of events in the DB to see if any upload errors occurred. While beta testing, a state upload a single invalid license record. The upload failure event was placed on the data event bus which placed a message on a SQS queue for processing. That queue has a batch size of 10 and it will wait for up to 5 minutes to process messages if the batch size is not met. The one upload failure notification was sitting in the queue long enough that by the time it was stored in the DB, the notification error reporter had already completed its scan for the 15 minutes and never detected the error. This race condition exists in all three compact codebases (JCC, Cosm, and SW) This can also occur if there is a large CSV upload and all of the errors are toward the end of the file. Previously, we were setting the same time stamp for all row failures. This updates the timestamp to use the actual time that the row was processed and the error was detected.
|
Warning Review limit reached
More reviews will be available in 31 minutes and 41 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (14)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
jlkravitz
left a comment
There was a problem hiding this comment.
One question, but otherwise- great catch!
jlkravitz
left a comment
There was a problem hiding this comment.
@isabeleliassen Good to merge.
There is a race condition that sometimes occurs when a low volume of invalid licenses are uploaded in a CSV file. We have a error email reporter that sends out license upload failure emails every 15 minutes, and this reporting process looks at the last 15 minutes of events in the DB to see if any upload errors occurred. While beta testing, a state uploaded a single invalid license record. The upload failure event was placed on the data event bus which placed a message on a SQS queue for processing. That queue has a batch size of 10 and it will wait for up to 5 minutes to process messages if the batch size is not met. The one upload failure notification was sitting in the queue long enough that by the time it was stored in the DB, the notification error reporter had already completed its scan for the 15 minutes and never detected the error. This race condition exists in all three compact codebases (JCC, Cosm, and SW). This change reduces the batch window for the data event processor queue from 5 minutes to 5 seconds to address this.
This can also occur if there is a large CSV upload and all of the errors are toward the end of the file. Previously, we were setting the same time stamp for all row failures, so if a large upload took 15 minutes to process, and the errors were toward the end of the file, they wouldn't be stored in the database until after the ingest error reporter had scanned the 15 minute window, so the errors wouldn't be reported. This updates the timestamp to use the actual time that the row was processed and the error was detected.
Testing List
yarn test:unit:allshould run without errors or warningsyarn serveshould run without errors or warningsyarn buildshould run without errors or warningsbackend/compact-connect/tests/unit/test_api.pyrun compact-connect/bin/download_oas30.pyCloses #1633