Skip to content

Estimate caret position from a hidden text layout when AX exposes only the field frame#670

Merged
FuJacob merged 6 commits into
mainfrom
feat/textkit-caret-estimation
Jun 11, 2026
Merged

Estimate caret position from a hidden text layout when AX exposes only the field frame#670
FuJacob merged 6 commits into
mainfrom
feat/textkit-caret-estimation

Conversation

@FuJacob

@FuJacob FuJacob commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Summary

Hosts whose AX tree is too shallow to report real caret geometry (only AXFrame is available, quality .estimated) either drifted on a proportional text-length guess or punted to the popup card. This PR adds TextLayoutCaretEstimator: at presentation time, the text before the caret is laid out in a hidden TextKit stack constrained to the field's width, and the insertion point read off that layout — soft wraps included — becomes the overlay anchor under a new .layoutEstimated quality that renders inline ghost text.

The layout is generalized (deliberately no per-app tables) but calibrated with what the host itself reveals: a .derived caret rect's height is a real rendered line box and pins the paragraph line height (CSS line-height routinely exceeds font metrics, and that error compounds per line); observedCharWidth rescales the layout font so soft-wrap points match the host; and the child-run walk now also captures the leftmost/topmost run edges, replacing guessed content insets with measured padding.

Repair applies in two modes:

  • .estimated (AXFrame-only hosts): a passing estimate always replaces the guess.
  • .derived (child-run hosts like Gmail/Outlook): the estimate only overrides the AX rect when the two disagree vertically by more than three-quarters of a line — the Gmail blank-line drift that maps the caret into the wrong visual line. On agreement the AX rect is kept, so well-behaved derived hosts never regress.

The estimator stays conservative: a prefix that filled the 4096-unit context window, laid-out content that overflows the visible field (scroll ambiguity), tab characters, or an unusable frame reject the estimate and the presentation keeps the existing fallback bit-for-bit. The word-accept path feeds the not-yet-published insertion chunk into the layout, so the repaired anchor models wrap-to-next-line, which the existing X-shift prediction cannot.

Validation

xcodebuild -project Cotabby.xcodeproj -scheme Cotabby -destination 'platform=macOS' build \
  -derivedDataPath build/DerivedData
# ** BUILD SUCCEEDED **

xcodebuild test ... -only-testing:CotabbyTests/TextLayoutCaretEstimatorTests \
  -only-testing:CotabbyTests/SuggestionCaretLayoutRepairTests \
  -only-testing:CotabbyTests/CompletionRenderModePolicyTests \
  -only-testing:CotabbyTests/FocusSnapshotResolverSelectionTests \
  CODE_SIGNING_ALLOWED=NO CODE_SIGNING_REQUIRED=NO
# ** TEST SUCCEEDED ** — 50 tests, 0 failures
# (locks coordinate mapping, wrap math, RTL right-alignment, hanging-space clamp, every
#  rejection gate, the calibration paths, and the derived agreement/mismatch rule)

xcodebuild test ... -skip-testing:CotabbyTests/FoundationModelDriftEvalTests \
  CODE_SIGNING_ALLOWED=NO CODE_SIGNING_REQUIRED=NO
# ** TEST SUCCEEDED ** — full suite

swiftlint lint --quiet
# exit 0, no findings

Manually exercised against a Gmail-style compose field (first iteration); the calibration pass in the second commit came out of that field test. The repair logs a structured caret-layout-repair stage (outcome, rejection reason, estimate-vs-AX vertical delta, which calibrations applied, request_id) to the JSONL stream under -cotabby-debug for further field verification.

Linked issues

Refs #94 — improves the .estimated and wrong-line .derived slices of that report: after a word accept, the layout estimate (with the pending insertion appended) places the remaining ghost on the correct visual line instead of marching along the old one. Hosts whose AX geometry is exact remain on post-insertion AX refresh.

Risk / rollout notes

  • Behavior change (estimated): hosts that previously always promoted to the popup card under .auto now render inline when the estimate passes the gates. alwaysMirror still forces the card; mid-line promotion still applies.
  • Behavior change (derived): a derived AX rect can now be overridden, but only past the 0.75-line vertical-mismatch gate; agreement keeps AX exactly as before. Flip-flop near the threshold is possible in principle; the generous tolerance and the presenter's equality short-circuit bound it.
  • Performance: one detached TextKit layout of at most 4096 UTF-16 units per overlay presentation, on .estimated/.derived hosts only — presentation path, never the focus poll.
  • Graceful degradation: a session can flap inline → card when a gate trips mid-session (e.g. the text grows past the visible field). Intentional, diagnosable via the caret-layout-repair log stage.
  • Known approximations: symmetric-padding assumption for measured insets; measured top edge skipped when the prefix starts with a blank line; wrong-font hosts without measurable runs still rely on the fits-in-field gate. Constants live in one Metrics enum for tuning.
  • project.pbxproj regenerated by XcodeGen (additions only).

Greptile Summary

This PR introduces TextLayoutCaretEstimator, which estimates caret geometry via a hidden TextKit layout pass when the host's AX tree only exposes an AXFrame. The estimator is calibrated with the host's own observed measurements (line-box height, char width, content edges) and is conservatively gated — tabs, truncated context, overflow, and unusable frames all cause it to fall back to the previous behavior bit-for-bit.

  • New TextLayoutCaretEstimator: performs a one-shot detached TextKit layout of the prefix text to determine the insertion-point position, replacing the AXFrame proportional guess for .estimated-quality contexts and overriding .derived rects that drift more than ¾ of a line vertically.
  • Improved child-run placement in AXTextGeometryResolver: replaces the old cumulative-length walk (which drifted by one character per paragraph separator) with a text-alignment based anchor pass plus a boundary-clean secondary pass, eliminating the multi-line Gmail drift regression.
  • Input suppression fix: registerSyntheticInsertion now accumulates rather than overwrites tokens across rapid accept bursts, and InputMonitor checks synthetic identity before the token countdown so a racing real keystroke cannot drain a slot and allow a synthetic Cmd-V to leak through.

Confidence Score: 5/5

Safe to merge. All new behavior is behind conservative gates that fall back to existing behavior on any uncertainty, and the input-suppression fix addresses a real rapid-accept regression without touching any other path.

The estimator is purely additive: every rejection gate preserves the old behavior bit-for-bit, and the single-entry memo means the TextKit allocation happens at most once per unique (text, frame) pair. The run-mapping rewrite in AXTextGeometryResolver is well-covered by the captured-Gmail-fixture test. The input-suppression accumulation fix has its own new test suite. The only finding is a diagnostic dead-code path in the JSONL log — the run-measured bypass is unreachable via keptReason, so that label never appears in logs — which affects observability only, not correctness.

The logCaretLayoutRepair function in SuggestionCoordinator+Acceptance.swift has a dead log value ("kept_ax_run_measured") that will never be emitted; the run-measured bypass exits before the logger is reached. This affects debuggability of the Gmail/Outlook path but not correctness.

Important Files Changed

Filename Overview
Cotabby/Support/TextLayoutCaretEstimator.swift New file — implements the hidden TextKit layout pass. Well-gated with conservative rejection conditions, a single-entry memo cache, and clear coordinate-space comments. No issues found.
Cotabby/App/Coordinators/SuggestionCoordinator+Acceptance.swift Adds layoutRepairedAnchor() and logCaretLayoutRepair(); the repair logic and the keptReason string construction have a minor logic gap for the run-measured branch (see comment).
Cotabby/Services/Focus/AXTextGeometryResolver.swift Replaces the cumulative-length caret-to-run mapping with a text-alignment-based pass; introduces boundary-clean anchoring and a windowed second pass for fused blocks. Logic is correct and well-tested.
Cotabby/Services/Input/InputSuppressionController.swift Changes registerSyntheticInsertion to accumulate rather than overwrite, fixing rapid-burst token drain; adds nonisolated deinit to avoid a MainActor double-free crash in tests.
Cotabby/Services/Input/InputMonitor.swift Re-orders synthetic identity check before countdown check so the synthetic marker always suppresses even if a racing real keystroke consumed the token.
Cotabby/Models/FocusModels.swift Adds .layoutEstimated case to CaretGeometryQuality and ObservedContentEdges struct; clean additions with good documentation.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["presentOverlay(text, caretRect, context)"] --> B["layoutRepairedAnchor()"]
    B --> C{caretQuality?}
    C -- ".exact or .layoutEstimated" --> D["Return fallback unchanged"]
    C -- ".derived with observedContentEdges" --> E["Run-measured: skip estimator\nReturn fallback (.derived)"]
    C -- ".estimated" --> F["TextLayoutCaretEstimator.estimate(input)"]
    C -- ".derived (no content edges)" --> F
    F --> G{Rejection gates}
    G -- "prefixTruncated / fieldFrameUnusable\ncontainsTab / verticalOverflow\nhorizontalOverflow / layoutFailed" --> H["Return fallback, quality unchanged"]
    G -- "estimate accepted" --> I{Quality == .derived?}
    I -- "Yes: verticallyAgrees?" --> J{"|estimate.midY - ax.midY| ≤ 0.75×lineH"}
    J -- "Agrees" --> K["Keep AX rect, quality .derived"]
    J -- "Disagrees" --> L["Substitute estimate, quality .layoutEstimated"]
    I -- "No (.estimated)" --> L
    L --> M["logCaretLayoutRepair"]
    K --> M
    M --> N["SuggestionOverlayGeometry"]
    N --> O{CompletionRenderModePolicy}
    O -- ".layoutEstimated" --> P["Ghost text inline"]
    O -- ".estimated" --> Q["Popup card"]
Loading

Fix All in Codex Fix All in Claude Code

Reviews (4): Last reviewed commit: "Merge remote-tracking branch 'origin/mai..." | Re-trigger Greptile

…y the field frame

Hosts whose AX tree only reports AXFrame (caret quality .estimated) used a
proportional text-length guess for X and punted to the popup card. Lay out
the text before the caret in a hidden TextKit stack constrained to the
field's width instead, and anchor the insertion point read off that layout
to the field frame. A passing estimate upgrades the overlay geometry to a
new .layoutEstimated quality, which renders inline ghost text; any gate
rejection (truncated context window, scroll ambiguity, tabs, unusable
frame) keeps today's card fallback unchanged.
Comment thread Cotabby/App/Coordinators/SuggestionCoordinator+Acceptance.swift
Comment thread Cotabby/App/Coordinators/SuggestionCoordinator+Acceptance.swift Outdated
…pair wrong-line derived rects

The uncalibrated layout underestimated vertical position on web editors:
TextKit's font-metric line height (~1.2x) compounds against CSS line-height
(~1.5-1.7x) into whole-line drift, and the guessed 4pt inset undershoots
real content padding. Calibrate with what the host already reveals: the
derived caret rect's height is a real rendered line box (pins the paragraph
line height), observedCharWidth rescales the layout font so wrap points
match, and the child-run walk now also captures the leftmost/topmost run
edges as measured content insets.

Extend the repair to .derived geometry behind a line-mismatch gate: the
estimate only overrides the AX rect when the two disagree vertically by
more than three-quarters of a line (the Gmail blank-line drift), so
well-behaved derived hosts keep their AX rects untouched. Repair logs now
carry the estimate-vs-AX delta and which calibrations applied.
@FuJacob

FuJacob commented Jun 11, 2026

Copy link
Copy Markdown
Owner Author

Pushed a calibration follow-up (fddda81) after a first field test in a Gmail-style compose box: the uncalibrated layout drifted ~2 lines high on multi-paragraph text (font-metric line height vs the host's larger CSS line-height, compounding per line) and sat too far left (guessed 4pt inset vs real content padding). The layout font, line height, and content insets are now calibrated from the host's own measured run frames, and the repair extends to .derived geometry behind a 0.75-line vertical-mismatch gate so well-behaved derived hosts keep their AX rects.

…red rects over the layout estimate

The child-run walk mapped the caret offset into runs by cumulative text
length, silently assuming the parent AX value is the run texts concatenated
with nothing in between. Chromium editors separate blocks with newlines the
runs do not contain, so every line break before the caret dragged the
mapping one character deeper and the ghost landed whole lines below its
real run. Align each run's text inside the parent value by sequential
search instead; the mapping is then exact under any separator convention,
including blank lines the host collapses out of the value entirely.

At presentation, a derived rect that came from measured run frames is now
always kept: run frames carry the host's real line positions including
collapsed blank lines, which the doc-stacked layout estimate cannot see
(it sat one line high per collapsed blank in Gmail). The estimate is still
computed for the diagnostic delta, and repair logs now carry caret_source.
@FuJacob

FuJacob commented Jun 11, 2026

Copy link
Copy Markdown
Owner Author

Second field iteration (c987910). The "exactly 3 lines up" report with 3 blank lines above the caret pinned the residual: this host class collapses blank lines out of the AX text, so any layout stacked from the document top loses one line per blank. Text can't recover what the host never reports, but the child-run frames can: they carry the real rendered Y of every line, blanks included.

Two changes:

  • The run walk now maps the caret to a run by aligning run texts inside the parent value (sequential search) instead of cumulative lengths. Cumulative math drifted one character per separator newline, which is what used to park the ghost several lines below after a few paragraphs. Alignment is exact under any separator convention, including collapsed blanks.
  • At presentation, a derived rect that came from measured run frames is always kept; the doc-stacked estimate (blank-blind) never overrides it. The estimate still runs for the diagnostic delta, and the repair log now records caret_source plus kept_ax_run_measured vs kept_ax_agreement vs substituted.

Net positioning sources by host class: run-mapped hosts anchor to measured run frames (alignment-fixed), AXFrame-only hosts use the calibrated layout estimate, prev-char-bounds hosts keep the mismatch gate.

…rfaces

Captured Gmail values exposed three realities the alignment must survive:
the parent value flattens blocks with inconsistent separators (sometimes
none at all, fusing lines into clumps like "i'mhi"), mixes non-breaking
and plain spaces, and can omit run text entirely. Matching now runs on a
length-preserving whitespace-normalized form, pass one anchors runs only
at word-boundary-clean matches so a short run cannot anchor inside a fused
clump, and pass two recovers genuinely fused runs with a plain search
constrained between their anchored neighbors. Runs that still cannot
anchor are skipped (the caret maps against the rest) instead of dragging
the whole mapping back to the legacy cumulative walk.

Debugging this class of bug has been throttled by unprovable test
conditions, so the surfaces now identify themselves: the caret badge
carries the mapping mode (runs-aligned/partial/legacy via the caret
source label) and a build stamp from the executable's modification time,
and CLAUDE.md documents that dev-identity builds log to
"~/Library/Logs/Cotabby Dev/" rather than the prod path.
@FuJacob

FuJacob commented Jun 11, 2026

Copy link
Copy Markdown
Owner Author

Third iteration (afaedcf), this time with real captured data: the dev-identity build logs to "~/Library/Logs/Cotabby Dev/" (now documented in CLAUDE.md), and its llm-io stream showed what Gmail actually exposes. The parent AX value is the compose region flattened: block boundaries become spaces or vanish entirely (adjacent lines fuse into clumps like "i'mhi"), non-breaking and plain spaces are mixed, and header text rides along. Two consequences:

  • Text-layout estimation can never reconstruct visual lines on this host class; run frames are the only vertical truth. (Already enforced by the run-measured keep rule.)
  • The run alignment needed hardening: matching now happens on a length-preserving whitespace-normalized form; pass one anchors runs only at word-boundary-clean matches so "hi" cannot anchor inside "i'mhi"; pass two recovers genuinely fused runs inside the window between anchored neighbors; unanchorable runs are skipped instead of dragging everything back to the cumulative walk.

Also addressed the meta-problem that burned three field iterations: every test so far ran a stale binary (the last one launched a product built before any of this code, which is why it reproduced the original drift exactly). The debug caret badge now shows the caret-to-run mapping mode and a build stamp derived from the executable's modification time, so a single screenshot proves both the build and the mapping path.

Field logs showed six Tab accepts inside 0.9s followed by "typing
invalidated the current suggestion" — with no typing. The synthetic
keystroke suppression counter was OVERWRITTEN on each arm, but event-tap
delivery is asynchronous: a burst arms the next chunk's suppressions
while the previous chunk's keydowns are still in flight, the earlier
chunk drains the new tokens, and the later chunk's tail leaks into the
observer as user typing, mismatching the already-advanced session and
killing the ghost mid-acceptance.

Three layers: the suppression counter now accumulates within its expiry
window instead of overwriting; the observer tap recognizes Cotabby's own
events by their synthetic marker first (identity survives any counter
race; the marker was already stamped on every posted event but only the
accept tap checked it); and the caret layout estimator no longer runs at
all on run-measured derived presents (its result was discarded
unconditionally) plus memoizes identical inputs, removing TextKit layout
work from the accept keystroke's handling window.

InputSuppressionController gains the nonisolated deinit required to
survive app-hosted test deallocation (the known isolated-deinit
back-deploy double-free).
@FuJacob

FuJacob commented Jun 11, 2026

Copy link
Copy Markdown
Owner Author

Fourth iteration (c4a34d0): the caret positioning was field-confirmed working, and the remaining report was rapid Tab accepts sometimes dropping the suggestion. The field logs had it precisely: six Tab-accepted-chunk stages inside 0.9s, then "Invalidating active suggestion: Overlay hidden because typing invalidated the current suggestion" with no typing involved.

Root cause: the synthetic-keystroke suppression counter was overwritten on every arm. Tap delivery is asynchronous, so a burst arms chunk N+1's suppressions while chunk N's keydowns are still in flight; chunk N drains the new tokens and chunk N+1's tail leaks into the observer as user typing, which mismatches the already-advanced session and kills the ghost mid-acceptance.

Fixes, three layers: the counter accumulates within its expiry window; the observer tap now checks the synthetic event marker first (it was already stamped on every posted event, but only the accept tap consulted it, and identity survives any counter race); and the layout estimator no longer runs on run-measured derived presents (result was discarded unconditionally) and memoizes identical inputs, keeping TextKit work out of the accept keystroke's handling window entirely.

Validation: full suite green including new InputSuppressionControllerTests locking the accumulate/expiry/marker rules (plus the nonisolated deinit the app-hosted test host requires); swiftlint clean.

# Conflicts:
#	Cotabby.xcodeproj/project.pbxproj
#	Cotabby/Services/Input/InputMonitor.swift
@FuJacob FuJacob merged commit c539e98 into main Jun 11, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant