Threaded healthcheck server + take db off the critical path#8
Open
mahmoud wants to merge 4 commits into
Open
Conversation
- Combine _health_ok + _health_info into atomic _health_state tuple to prevent torn reads between HTTP handler and heartbeat threads - Set health state to unhealthy during graceful shutdown - Remove redundant _health_ok = False assignment - Fix import ordering (alphabetical stdlib) - Remove extra blank line before ThreadingWSGIServer - Simplify test fixture: use real Config instead of mock.patch - Remove stray files unrelated to healthcheck fix
Contributor
Author
|
Circling back, been running this in production for months now. The only times I've been alerted were when the queue worker was actually down (oom'd due to business logic :P ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixing a longstanding nuisance in my production env: Render times out the healthcheck after only 5 seconds, and the basic server just can't keep up with that. Not sure if it's due to db variability or just jankiness of the non-threaded healthcheck server, so this PR addresses both. I just deployed this in my staging environment, so feel free to treat this as a draft for now while I let the reliability bake. :)