Upstream 2026.06.15#149
Merged
Merged
Conversation
59d9884 to
817ccfc
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Syncs the latest internal changes into the open-source repo (through 2026.06.15).
52 files change (9 new). The headline themes for anyone consuming Trailblaze:
assertVisiblecan now check an element's text, not just its presence. Passthe new optional
expectedTextto assert the rendered value (e.g. "expect thecheckout button to read
Charge $5.00", "expect status to beActive"). Volatiletrailing state like a live item count (
Review sale\n3 items) is detected atcapture time and matched loosely so it doesn't make replays brittle.
clearTexttool wipes the focused text field completely before you type.The bare Maestro
eraseTextsilently caps at 50 characters on every driver, solong values (passwords, addresses, search queries) used to survive a "clear" and
concatenate onto the next input.
clearTextreads the field's real length anderases exactly that many characters.
step names a target that isn't among the tappable elements, the agent now scrolls
to reveal it (if there's a scroll affordance) or stops with a clear "you may be on
the wrong screen" signal — instead of tapping an unrelated nearby element and
marching on.
resolved tap coordinate falls inside the on-screen keyboard's bounds (which happens
on modal screens that swallow the Back gesture and refuse to dismiss the keyboard),
the tap now fails loudly with a clear message instead of silently landing on the
keyboard.
A new
TRAILBLAZE_HOMEenvironment variable isolates each daemon's state directory(logs, TLS keystore) so concurrent daemons don't race a shared keystore on boot.
What changed, by area
Agent behavior & verification
BlazeGoalPlanner,ReflectionNode): a new pre-executioncheck (
detectTargetMissingRecovery) keeps the agent from tapping a distractorelement when the step's named target isn't actually on screen — it scrolls toward a
revealed affordance or surfaces a wrong-screen signal.
StepToolSet): the verify prompt nowrequires the agent to call an assertion tool that returns success (not just "look
at the screen"), so the captured trail has a concrete assertion to replay
deterministically.
assertVisibletext verification +clearTextAssertVisibleTrailblazeTool/AssertVisibleBySelectorTrailblazeToolgain anoptional
expectedTextplus aTextMatchMode(EXACT/PREFIX/REGEX).Default stays
EXACT, and trails recorded before this change deserialize unchanged.VolatileTextDetectorrewrites a capturedexpectedTextso a trailing volatileitem-count subtitle is tolerated at replay (changed or gone), while the stable
text stays an exact requirement.
ClearTextTrailblazeTool(registered in thecore_interactiontoolset) and itsgenerated reference doc.
Android accessibility driver
AccessibilityDeviceManager,TrailblazeAccessibilityService): pre-tap check against the live keyboard windowbounds (with a
dumpsysfallback when window enumeration is degraded);hideKeyboardis now best-effort on its post-check, with an env-gated
SHOW_MODE_HIDDENdismissalpath (
TRAILBLAZE_IME_DISMISS_VIA_SHOW_MODE) for Compose modals that consume Back.AndroidCompactElementList): emits a non-tappable"(scroll … to reveal)" affordance for off-screen editable fields, and recurses into
Compose interop containers that are marked invisible but whose children (modal
content, overlay buttons) are individually visible.
TrailblazeNodeSelectorMinimizer, generator): anindex-carrying selector keeps its most-stable anchor instead of collapsing to a
naked positional index (which shifts whenever anything before the target changes);
a genuinely attribute-less node still falls back to a bare index.
TapTrailblazeTool): logs when the OS hit-test winnerat a tapped coordinate differs from the element the agent picked, making a class of
"looked-correct" selector mismatches observable in every capture.
Concurrency & daemon isolation
TRAILBLAZE_HOMEhonored bygetDefaultAppDataDirectory,SslConfig, and the MCPstdio proxy so concurrent daemons isolate their state dirs.
DaemonScriptedToolBundler: the stale-wrapper sweep only deletes wrapper filesolder than the esbuild timeout, so daemons in separate JVMs sharing a source dir
don't delete each other's in-flight wrappers.
RunYamlRequestHandler: an originating HTTP-call cancellation now cancels thelaunched job rather than stalling to the handler timeout cap.
Security & install hygiene
AndroidHostAdbUtils.redactSecretsForLog: LLM provider auth tokens passed astrailblaze.llm.auth.token.<provider>args are redacted from logged adb shellcommands (CI archives those logs as downloadable artifacts).
PrecompiledApkInstaller: the on-device APK-SHA marker write now usesprintfinstead of a
sh -c "echo … > file"wrapper that got mangled by the argv space-join— the marker was silently blank, defeating server reuse and forcing a reinstall on
every run. Now written verbatim and read back to verify.
Scripted-tool (QuickJS) execution
QuickJsToolSerializer/QuickJsTrailblazeTool/LazyYamlScriptedToolRegistration:thread the session binding into the decoded tool so a nested
client.callTool(...)from inside a bundled handler resolves its execution context on the LLM dispatch
path (previously returned "no execution context installed").
Desktop run-config & reporting
TrailblazeDeviceManager: per-run overrides from the Run-config dialog (self-heal,recorded-steps replay, max LLM calls, memory seeds, video/logcat/iOS-log/network
capture). All default to prior behavior, so existing callers are unaffected.
MainTrailblazeApp: an optionalextraDaemonRouteshook lets a downstream desktopedition register extra daemon endpoints without the lower layer depending on them.
TrailblazeServerState.captureAnalytics,LogsRepo(retry surfacing a newlydetected session until its first log is written; reuse cached info for completed
sessions), and test-result
priority(P0/P1/P2) + Android/iOS build-version fields.Dependencies, examples, docs
micrometer1.15.12 andktor-client-mock; refresh the runtime-classpathbaselines (micrometer 1.13.4 → 1.15.12).
examples/playwright-electron: aprovision-electron.shstep fetches the Electronplatform binary explicitly (npm ≥ 11.16 no longer runs Electron's postinstall).
assertVisibleexpectedText, newclearText,updated toolset counts).
Test plan
spotlessApplyclean