Skip to content

Upstream 2026.06.15#149

Merged
handstandsam merged 1 commit into
mainfrom
upstream-2026.06.15
Jun 15, 2026
Merged

Upstream 2026.06.15#149
handstandsam merged 1 commit into
mainfrom
upstream-2026.06.15

Conversation

@handstandsam

Copy link
Copy Markdown
Collaborator

Summary

Syncs the latest internal changes into the open-source repo (through 2026.06.15).
52 files change (9 new). The headline themes for anyone consuming Trailblaze:

  • assertVisible can now check an element's text, not just its presence. Pass
    the new optional expectedText to assert the rendered value (e.g. "expect the
    checkout button to read Charge $5.00", "expect status to be Active"). Volatile
    trailing state like a live item count (Review sale\n3 items) is detected at
    capture time and matched loosely so it doesn't make replays brittle.
  • New clearText tool wipes the focused text field completely before you type.
    The bare Maestro eraseText silently caps at 50 characters on every driver, so
    long values (passwords, addresses, search queries) used to survive a "clear" and
    concatenate onto the next input. clearText reads the field's real length and
    erases exactly that many characters.
  • The agent stops tapping the wrong thing when its target is off-screen. When a
    step names a target that isn't among the tappable elements, the agent now scrolls
    to reveal it (if there's a scroll affordance) or stops with a clear "you may be on
    the wrong screen" signal — instead of tapping an unrelated nearby element and
    marching on.
  • Soft-keyboard mis-taps on Android are caught at the point of impact. If a
    resolved tap coordinate falls inside the on-screen keyboard's bounds (which happens
    on modal screens that swallow the Back gesture and refuse to dismiss the keyboard),
    the tap now fails loudly with a clear message instead of silently landing on the
    keyboard.
  • Multiple Trailblaze daemons can run on one host without stepping on each other.
    A new TRAILBLAZE_HOME environment variable isolates each daemon's state directory
    (logs, TLS keystore) so concurrent daemons don't race a shared keystore on boot.

What changed, by area

Agent behavior & verification

  • Target discipline (BlazeGoalPlanner, ReflectionNode): a new pre-execution
    check (detectTargetMissingRecovery) keeps the agent from tapping a distractor
    element when the step's named target isn't actually on screen — it scrolls toward a
    revealed affordance or surfaces a wrong-screen signal.
  • Verify steps must leave a real assertion (StepToolSet): the verify prompt now
    requires the agent to call an assertion tool that returns success (not just "look
    at the screen"), so the captured trail has a concrete assertion to replay
    deterministically.

assertVisible text verification + clearText

  • AssertVisibleTrailblazeTool / AssertVisibleBySelectorTrailblazeTool gain an
    optional expectedText plus a TextMatchMode (EXACT / PREFIX / REGEX).
    Default stays EXACT, and trails recorded before this change deserialize unchanged.
  • VolatileTextDetector rewrites a captured expectedText so a trailing volatile
    item-count subtitle is tolerated at replay (changed or gone), while the stable
    text stays an exact requirement.
  • New ClearTextTrailblazeTool (registered in the core_interaction toolset) and its
    generated reference doc.

Android accessibility driver

  • IME-occlusion safety net (AccessibilityDeviceManager,
    TrailblazeAccessibilityService): pre-tap check against the live keyboard window
    bounds (with a dumpsys fallback when window enumeration is degraded); hideKeyboard
    is now best-effort on its post-check, with an env-gated SHOW_MODE_HIDDEN dismissal
    path (TRAILBLAZE_IME_DISMISS_VIA_SHOW_MODE) for Compose modals that consume Back.
  • Compact element list (AndroidCompactElementList): emits a non-tappable
    "(scroll … to reveal)" affordance for off-screen editable fields, and recurses into
    Compose interop containers that are marked invisible but whose children (modal
    content, overlay buttons) are individually visible.
  • Selector stability (TrailblazeNodeSelectorMinimizer, generator): an
    index-carrying selector keeps its most-stable anchor instead of collapsing to a
    naked positional index (which shifts whenever anything before the target changes);
    a genuinely attribute-less node still falls back to a bare index.
  • Tap-divergence logging (TapTrailblazeTool): logs when the OS hit-test winner
    at a tapped coordinate differs from the element the agent picked, making a class of
    "looked-correct" selector mismatches observable in every capture.

Concurrency & daemon isolation

  • TRAILBLAZE_HOME honored by getDefaultAppDataDirectory, SslConfig, and the MCP
    stdio proxy so concurrent daemons isolate their state dirs.
  • DaemonScriptedToolBundler: the stale-wrapper sweep only deletes wrapper files
    older than the esbuild timeout, so daemons in separate JVMs sharing a source dir
    don't delete each other's in-flight wrappers.
  • RunYamlRequestHandler: an originating HTTP-call cancellation now cancels the
    launched job rather than stalling to the handler timeout cap.

Security & install hygiene

  • AndroidHostAdbUtils.redactSecretsForLog: LLM provider auth tokens passed as
    trailblaze.llm.auth.token.<provider> args are redacted from logged adb shell
    commands (CI archives those logs as downloadable artifacts).
  • PrecompiledApkInstaller: the on-device APK-SHA marker write now uses printf
    instead of a sh -c "echo … > file" wrapper that got mangled by the argv space-join
    — the marker was silently blank, defeating server reuse and forcing a reinstall on
    every run. Now written verbatim and read back to verify.

Scripted-tool (QuickJS) execution

  • QuickJsToolSerializer / QuickJsTrailblazeTool / LazyYamlScriptedToolRegistration:
    thread the session binding into the decoded tool so a nested client.callTool(...)
    from inside a bundled handler resolves its execution context on the LLM dispatch
    path (previously returned "no execution context installed").

Desktop run-config & reporting

  • TrailblazeDeviceManager: per-run overrides from the Run-config dialog (self-heal,
    recorded-steps replay, max LLM calls, memory seeds, video/logcat/iOS-log/network
    capture). All default to prior behavior, so existing callers are unaffected.
  • MainTrailblazeApp: an optional extraDaemonRoutes hook lets a downstream desktop
    edition register extra daemon endpoints without the lower layer depending on them.
  • TrailblazeServerState.captureAnalytics, LogsRepo (retry surfacing a newly
    detected session until its first log is written; reuse cached info for completed
    sessions), and test-result priority (P0/P1/P2) + Android/iOS build-version fields.

Dependencies, examples, docs

  • Add micrometer 1.15.12 and ktor-client-mock; refresh the runtime-classpath
    baselines (micrometer 1.13.4 → 1.15.12).
  • examples/playwright-electron: a provision-electron.sh step fetches the Electron
    platform binary explicitly (npm ≥ 11.16 no longer runs Electron's postinstall).
  • Regenerated tool-reference docs (assertVisible expectedText, new clearText,
    updated toolset counts).

Test plan

  • OSS CI green (build + unit tests across modules)
  • spotlessApply clean
  • Sensitive-terms scan clean (run pre-sync)

@handstandsam handstandsam force-pushed the upstream-2026.06.15 branch from 59d9884 to 817ccfc Compare June 15, 2026 16:28
@handstandsam handstandsam merged commit b1d5351 into main Jun 15, 2026
12 of 14 checks passed
@handstandsam handstandsam deleted the upstream-2026.06.15 branch June 15, 2026 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant