Estimate caret position from a hidden text layout when AX exposes only the field frame#670
Conversation
…y the field frame Hosts whose AX tree only reports AXFrame (caret quality .estimated) used a proportional text-length guess for X and punted to the popup card. Lay out the text before the caret in a hidden TextKit stack constrained to the field's width instead, and anchor the insertion point read off that layout to the field frame. A passing estimate upgrades the overlay geometry to a new .layoutEstimated quality, which renders inline ghost text; any gate rejection (truncated context window, scroll ambiguity, tabs, unusable frame) keeps today's card fallback unchanged.
…pair wrong-line derived rects The uncalibrated layout underestimated vertical position on web editors: TextKit's font-metric line height (~1.2x) compounds against CSS line-height (~1.5-1.7x) into whole-line drift, and the guessed 4pt inset undershoots real content padding. Calibrate with what the host already reveals: the derived caret rect's height is a real rendered line box (pins the paragraph line height), observedCharWidth rescales the layout font so wrap points match, and the child-run walk now also captures the leftmost/topmost run edges as measured content insets. Extend the repair to .derived geometry behind a line-mismatch gate: the estimate only overrides the AX rect when the two disagree vertically by more than three-quarters of a line (the Gmail blank-line drift), so well-behaved derived hosts keep their AX rects untouched. Repair logs now carry the estimate-vs-AX delta and which calibrations applied.
|
Pushed a calibration follow-up (fddda81) after a first field test in a Gmail-style compose box: the uncalibrated layout drifted ~2 lines high on multi-paragraph text (font-metric line height vs the host's larger CSS line-height, compounding per line) and sat too far left (guessed 4pt inset vs real content padding). The layout font, line height, and content insets are now calibrated from the host's own measured run frames, and the repair extends to .derived geometry behind a 0.75-line vertical-mismatch gate so well-behaved derived hosts keep their AX rects. |
…red rects over the layout estimate The child-run walk mapped the caret offset into runs by cumulative text length, silently assuming the parent AX value is the run texts concatenated with nothing in between. Chromium editors separate blocks with newlines the runs do not contain, so every line break before the caret dragged the mapping one character deeper and the ghost landed whole lines below its real run. Align each run's text inside the parent value by sequential search instead; the mapping is then exact under any separator convention, including blank lines the host collapses out of the value entirely. At presentation, a derived rect that came from measured run frames is now always kept: run frames carry the host's real line positions including collapsed blank lines, which the doc-stacked layout estimate cannot see (it sat one line high per collapsed blank in Gmail). The estimate is still computed for the diagnostic delta, and repair logs now carry caret_source.
|
Second field iteration (c987910). The "exactly 3 lines up" report with 3 blank lines above the caret pinned the residual: this host class collapses blank lines out of the AX text, so any layout stacked from the document top loses one line per blank. Text can't recover what the host never reports, but the child-run frames can: they carry the real rendered Y of every line, blanks included. Two changes:
Net positioning sources by host class: run-mapped hosts anchor to measured run frames (alignment-fixed), AXFrame-only hosts use the calibrated layout estimate, prev-char-bounds hosts keep the mismatch gate. |
…rfaces Captured Gmail values exposed three realities the alignment must survive: the parent value flattens blocks with inconsistent separators (sometimes none at all, fusing lines into clumps like "i'mhi"), mixes non-breaking and plain spaces, and can omit run text entirely. Matching now runs on a length-preserving whitespace-normalized form, pass one anchors runs only at word-boundary-clean matches so a short run cannot anchor inside a fused clump, and pass two recovers genuinely fused runs with a plain search constrained between their anchored neighbors. Runs that still cannot anchor are skipped (the caret maps against the rest) instead of dragging the whole mapping back to the legacy cumulative walk. Debugging this class of bug has been throttled by unprovable test conditions, so the surfaces now identify themselves: the caret badge carries the mapping mode (runs-aligned/partial/legacy via the caret source label) and a build stamp from the executable's modification time, and CLAUDE.md documents that dev-identity builds log to "~/Library/Logs/Cotabby Dev/" rather than the prod path.
|
Third iteration (afaedcf), this time with real captured data: the dev-identity build logs to "~/Library/Logs/Cotabby Dev/" (now documented in CLAUDE.md), and its llm-io stream showed what Gmail actually exposes. The parent AX value is the compose region flattened: block boundaries become spaces or vanish entirely (adjacent lines fuse into clumps like "i'mhi"), non-breaking and plain spaces are mixed, and header text rides along. Two consequences:
Also addressed the meta-problem that burned three field iterations: every test so far ran a stale binary (the last one launched a product built before any of this code, which is why it reproduced the original drift exactly). The debug caret badge now shows the caret-to-run mapping mode and a build stamp derived from the executable's modification time, so a single screenshot proves both the build and the mapping path. |
Field logs showed six Tab accepts inside 0.9s followed by "typing invalidated the current suggestion" — with no typing. The synthetic keystroke suppression counter was OVERWRITTEN on each arm, but event-tap delivery is asynchronous: a burst arms the next chunk's suppressions while the previous chunk's keydowns are still in flight, the earlier chunk drains the new tokens, and the later chunk's tail leaks into the observer as user typing, mismatching the already-advanced session and killing the ghost mid-acceptance. Three layers: the suppression counter now accumulates within its expiry window instead of overwriting; the observer tap recognizes Cotabby's own events by their synthetic marker first (identity survives any counter race; the marker was already stamped on every posted event but only the accept tap checked it); and the caret layout estimator no longer runs at all on run-measured derived presents (its result was discarded unconditionally) plus memoizes identical inputs, removing TextKit layout work from the accept keystroke's handling window. InputSuppressionController gains the nonisolated deinit required to survive app-hosted test deallocation (the known isolated-deinit back-deploy double-free).
|
Fourth iteration (c4a34d0): the caret positioning was field-confirmed working, and the remaining report was rapid Tab accepts sometimes dropping the suggestion. The field logs had it precisely: six Tab-accepted-chunk stages inside 0.9s, then "Invalidating active suggestion: Overlay hidden because typing invalidated the current suggestion" with no typing involved. Root cause: the synthetic-keystroke suppression counter was overwritten on every arm. Tap delivery is asynchronous, so a burst arms chunk N+1's suppressions while chunk N's keydowns are still in flight; chunk N drains the new tokens and chunk N+1's tail leaks into the observer as user typing, which mismatches the already-advanced session and kills the ghost mid-acceptance. Fixes, three layers: the counter accumulates within its expiry window; the observer tap now checks the synthetic event marker first (it was already stamped on every posted event, but only the accept tap consulted it, and identity survives any counter race); and the layout estimator no longer runs on run-measured derived presents (result was discarded unconditionally) and memoizes identical inputs, keeping TextKit work out of the accept keystroke's handling window entirely. Validation: full suite green including new InputSuppressionControllerTests locking the accumulate/expiry/marker rules (plus the nonisolated deinit the app-hosted test host requires); swiftlint clean. |
# Conflicts: # Cotabby.xcodeproj/project.pbxproj # Cotabby/Services/Input/InputMonitor.swift
Summary
Hosts whose AX tree is too shallow to report real caret geometry (only
AXFrameis available, quality.estimated) either drifted on a proportional text-length guess or punted to the popup card. This PR addsTextLayoutCaretEstimator: at presentation time, the text before the caret is laid out in a hidden TextKit stack constrained to the field's width, and the insertion point read off that layout — soft wraps included — becomes the overlay anchor under a new.layoutEstimatedquality that renders inline ghost text.The layout is generalized (deliberately no per-app tables) but calibrated with what the host itself reveals: a
.derivedcaret rect's height is a real rendered line box and pins the paragraph line height (CSS line-height routinely exceeds font metrics, and that error compounds per line);observedCharWidthrescales the layout font so soft-wrap points match the host; and the child-run walk now also captures the leftmost/topmost run edges, replacing guessed content insets with measured padding.Repair applies in two modes:
.estimated(AXFrame-only hosts): a passing estimate always replaces the guess..derived(child-run hosts like Gmail/Outlook): the estimate only overrides the AX rect when the two disagree vertically by more than three-quarters of a line — the Gmail blank-line drift that maps the caret into the wrong visual line. On agreement the AX rect is kept, so well-behaved derived hosts never regress.The estimator stays conservative: a prefix that filled the 4096-unit context window, laid-out content that overflows the visible field (scroll ambiguity), tab characters, or an unusable frame reject the estimate and the presentation keeps the existing fallback bit-for-bit. The word-accept path feeds the not-yet-published insertion chunk into the layout, so the repaired anchor models wrap-to-next-line, which the existing X-shift prediction cannot.
Validation
Manually exercised against a Gmail-style compose field (first iteration); the calibration pass in the second commit came out of that field test. The repair logs a structured
caret-layout-repairstage (outcome, rejection reason, estimate-vs-AX vertical delta, which calibrations applied,request_id) to the JSONL stream under-cotabby-debugfor further field verification.Linked issues
Refs #94 — improves the
.estimatedand wrong-line.derivedslices of that report: after a word accept, the layout estimate (with the pending insertion appended) places the remaining ghost on the correct visual line instead of marching along the old one. Hosts whose AX geometry is exact remain on post-insertion AX refresh.Risk / rollout notes
.autonow render inline when the estimate passes the gates.alwaysMirrorstill forces the card; mid-line promotion still applies..estimated/.derivedhosts only — presentation path, never the focus poll.caret-layout-repairlog stage.Metricsenum for tuning.project.pbxprojregenerated by XcodeGen (additions only).Greptile Summary
This PR introduces
TextLayoutCaretEstimator, which estimates caret geometry via a hidden TextKit layout pass when the host's AX tree only exposes anAXFrame. The estimator is calibrated with the host's own observed measurements (line-box height, char width, content edges) and is conservatively gated — tabs, truncated context, overflow, and unusable frames all cause it to fall back to the previous behavior bit-for-bit.TextLayoutCaretEstimator: performs a one-shot detached TextKit layout of the prefix text to determine the insertion-point position, replacing the AXFrame proportional guess for.estimated-quality contexts and overriding.derivedrects that drift more than ¾ of a line vertically.AXTextGeometryResolver: replaces the old cumulative-length walk (which drifted by one character per paragraph separator) with a text-alignment based anchor pass plus a boundary-clean secondary pass, eliminating the multi-line Gmail drift regression.registerSyntheticInsertionnow accumulates rather than overwrites tokens across rapid accept bursts, andInputMonitorchecks synthetic identity before the token countdown so a racing real keystroke cannot drain a slot and allow a synthetic Cmd-V to leak through.Confidence Score: 5/5
Safe to merge. All new behavior is behind conservative gates that fall back to existing behavior on any uncertainty, and the input-suppression fix addresses a real rapid-accept regression without touching any other path.
The estimator is purely additive: every rejection gate preserves the old behavior bit-for-bit, and the single-entry memo means the TextKit allocation happens at most once per unique (text, frame) pair. The run-mapping rewrite in AXTextGeometryResolver is well-covered by the captured-Gmail-fixture test. The input-suppression accumulation fix has its own new test suite. The only finding is a diagnostic dead-code path in the JSONL log — the run-measured bypass is unreachable via keptReason, so that label never appears in logs — which affects observability only, not correctness.
The logCaretLayoutRepair function in SuggestionCoordinator+Acceptance.swift has a dead log value ("kept_ax_run_measured") that will never be emitted; the run-measured bypass exits before the logger is reached. This affects debuggability of the Gmail/Outlook path but not correctness.
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A["presentOverlay(text, caretRect, context)"] --> B["layoutRepairedAnchor()"] B --> C{caretQuality?} C -- ".exact or .layoutEstimated" --> D["Return fallback unchanged"] C -- ".derived with observedContentEdges" --> E["Run-measured: skip estimator\nReturn fallback (.derived)"] C -- ".estimated" --> F["TextLayoutCaretEstimator.estimate(input)"] C -- ".derived (no content edges)" --> F F --> G{Rejection gates} G -- "prefixTruncated / fieldFrameUnusable\ncontainsTab / verticalOverflow\nhorizontalOverflow / layoutFailed" --> H["Return fallback, quality unchanged"] G -- "estimate accepted" --> I{Quality == .derived?} I -- "Yes: verticallyAgrees?" --> J{"|estimate.midY - ax.midY| ≤ 0.75×lineH"} J -- "Agrees" --> K["Keep AX rect, quality .derived"] J -- "Disagrees" --> L["Substitute estimate, quality .layoutEstimated"] I -- "No (.estimated)" --> L L --> M["logCaretLayoutRepair"] K --> M M --> N["SuggestionOverlayGeometry"] N --> O{CompletionRenderModePolicy} O -- ".layoutEstimated" --> P["Ghost text inline"] O -- ".estimated" --> Q["Popup card"]Reviews (4): Last reviewed commit: "Merge remote-tracking branch 'origin/mai..." | Re-trigger Greptile