fix(shared-runtime): guard shutdown() against Tokio TLS destruction #2169
fix(shared-runtime): guard shutdown() against Tokio TLS destruction #2169rachelyangdog wants to merge 1 commit into
Conversation
…uring CPython finalization During CPython interpreter finalization, thread-local storage is destroyed before atexit handlers fire. SharedRuntime::shutdown() calls runtime.block_on() which internally calls context::enter() to set up Tokio's CONTEXT thread-local. If that TLS slot is already destroyed, context::enter() panics with "The Tokio context thread-local variable has been destroyed", which PyO3 converts to a pyo3_runtime.PanicException. This causes a crash on every uWSGI worker shutdown when using ddtrace >=4.9.x. Fix: check Handle::try_current().is_thread_local_destroyed() before calling block_on(). If TLS is gone, return Ok(()) early — the OS will clean up remaining Tokio threads on process exit. This eliminates both the panic and the subsequent 60s hang/SIGKILL. Reproducer: uWSGI app with lazy-apps=true, ddtrace imported via uwsgi import=, 4 workers. SIGTERM triggers the panic on every worker. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e5b9a9050b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| tokio::runtime::Handle::try_current(), | ||
| Err(ref e) if e.is_thread_local_destroyed() | ||
| ) { | ||
| debug!("Tokio TLS destroyed during interpreter finalization, skipping shutdown"); | ||
| return Ok(()); |
There was a problem hiding this comment.
Take the runtime before skipping TLS-destroyed shutdown
In this branch shutdown() returns success without taking self.runtime or clearing the registered workers. The condition only proves that the calling thread's Tokio TLS has been destroyed, which can also happen from thread-local destructors during ordinary thread teardown or embedded interpreter finalization while the process continues; in that case the runtime remains available after a successful shutdown and its background workers can keep running until a later drop aborts them without Worker::shutdown(). Please mark the runtime as shut down or perform a non-TLS background shutdown before returning.
Useful? React with 👍 / 👎.
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
During CPython interpreter finalization, thread-local storage is destroyed before atexit handlers fire. SharedRuntime::shutdown() calls runtime.block_on() which internally calls context::enter() to set up Tokio's CONTEXT thread-local. If that TLS slot is already destroyed, context::enter() panics with "The Tokio context thread-local variable has been destroyed", which PyO3 converts to a pyo3_runtime.PanicException. This causes a crash on every uWSGI worker shutdown when using ddtrace >=4.9.x.
Fix: check Handle::try_current().is_thread_local_destroyed() before calling block_on(). If TLS is gone, return Ok(()) early — the OS will clean up remaining Tokio threads on process exit. This eliminates both the panic and the subsequent 60s hang/SIGKILL.
Reproducer: uWSGI app with lazy-apps=true, ddtrace imported via uwsgi import=, 4 workers. SIGTERM triggers the panic on every worker.
What does this PR do?
A brief description of the change being made with this pull request.
Motivation
What inspired you to submit this pull request?
Additional Notes
Anything else we should know when reviewing?
How to test the change?
Describe here in detail how the change can be validated.