Skip to content

Fix Sub::Quote weak reference cleanup#759

Closed
fglock wants to merge 3 commits into
masterfrom
fix/sub-quote-leaks
Closed

Fix Sub::Quote weak reference cleanup#759
fglock wants to merge 3 commits into
masterfrom
fix/sub-quote-leaks

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented May 19, 2026

Fix Sub::Quote leaks.t tests by clearing weak refs to CODE objects

Problem

The Sub::Quote leaks.t tests were failing because weak references to CODE objects were never cleared, preventing Sub::Quote::CLONE from cleaning up expired entries.

Solution

Modified WeakRefRegistry.clearWeakRefsTo() to always clear weak refs to CODE objects by passing true instead of false to the includeCode parameter. This allows Sub::Quote::CLONE to properly clean up expired entries.

Test Results

  • leaks.t: All 9 tests now pass (previously failed 4 tests)
  • quotify.t: All 2595 tests pass
  • hints.t: 2 tests still failing due to a known limitation with caller(0) returning empty warning bits in main script context (documented in lexical-warnings.md)
  • sub-defer.t and sub-quote.t: Timeout/OutOfMemoryError - this is a pre-existing issue that occurs even without any changes to WeakRefRegistry or ScalarUtil

OOM Investigation

The OOM error in sub-defer.t and sub-quote.t was investigated extensively:

  • Confirmed to be pre-existing (occurs even without any changes)
  • Tried optimizing clearAllBlessedWeakRefs by filtering during snapshot to reduce memory pressure - did not help
  • Tried adding PJ_SKIP_WEAK_CLEAR environment variable to skip weak ref cleanup - did not help
  • Tried increasing heap size with JPERL_OPTS="-Xmx4g" - still failed with OOM
  • The tests pass all subtests (33 for sub-defer.t, 51 for sub-quote.t) but then timeout during cleanup
  • The OOM is thrown from the UncaughtExceptionHandler in the orphan-watchdog thread

Root cause identified:
The OOM occurs in ScalarRefRegistry.forceGcAndSnapshot() at line 130, which is called by ReachabilityWalker.sweepWeakRefs() at line 1120 during cleanup. The forceGcAndSnapshot method runs 3 passes of GC (each calling System.gc() 5 times with 10ms sleep) to determine which objects are still reachable. With the many CODE refs and weak refs created by Sub::Defer/Sub::Quote, this GC forcing mechanism runs out of memory even with 4GB heap.

The Sub::Defer/Sub::Quote tests create many CODE refs and weak refs, and during cleanup, the reachability walker's GC forcing mechanism causes OOM. This is a fundamental part of the selective refcounting system and not easily fixable without redesigning that system.

Potential fixes (not implemented in this PR):

  • Skip or reduce GC cycles in forceGcAndSnapshot for memory-intensive scenarios
  • Add environment variable to disable reachability sweeping for specific tests
  • Accept as known limitation - these tests stress the GC beyond normal limits

Files Changed

  • WeakRefRegistry.java: Changed clearWeakRefsTo to always include CODE refs

fglock added 3 commits May 19, 2026 13:45
- Modified WeakRefRegistry.clearWeakRefsTo() to always clear weak refs to CODE objects
- Added includeCode parameter to allow conditional CODE ref clearing
- Added clearUnreachableCodeWeakRefs() method to clean up unreachable CODE refs
- Added clear_unreachable_code_weak_refs() function to Scalar::Util
- Modified ScalarUtil.isweak() and weaken() to call clearUnreachableCodeWeakRefs() for CODE refs

This fixes leaks.t tests which were failing because weak refs to CODE objects
were never cleared, preventing Sub::Quote::CLONE from cleaning up expired entries.

Note: hints.t still has 2 failing tests due to a known limitation with caller(0)
returning empty warning bits in main script context (documented in lexical-warnings.md).
The OOM error in sub-defer.t and sub-quote.t is a pre-existing issue
that occurs even without any changes to WeakRefRegistry or ScalarUtil.
This fix only changes clearWeakRefsTo to always include CODE refs,
which fixes leaks.t without exacerbating the OOM issue.
Documents the changes made in PR #759 to fix Sub::Quote leaks.t tests,
the investigation into the OOM error in sub-defer.t and sub-quote.t,
and 6 potential system redesign options to resolve the OOM issue.

Recommended approach is to bundle PerlOnJava-tuned versions of
Sub::Defer/Sub::Quote as the short-term solution.
@fglock fglock closed this May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant