Status: Draft Date: 2026-02-05 Goal: Reduce round-trips for multi-action workflows and provide intelligent state change detection
A simple login requires 3+ tool calls:
AI → set_value(#username, "user") → "Set value" → AI
AI → set_value(#password, "pass") → "Set value" → AI
AI → click_element(#submit) → "Clicked element" → AI
AI → get_visible_elements() → [elements...] → AI
Issues:
- Wasted round-trips — The AI already knows the full sequence; intermediate responses carry no information
- No state feedback — Tools return "Clicked element" but not whether anything changed
- Manual polling — AI must call
get_visible_elementsto see what happened - Latency — Each round-trip adds network + inference latency
AI → execute_sequence([
{ action: "set_value", selector: "#username", value: "user" },
{ action: "set_value", selector: "#password", value: "pass" },
{ action: "click_element", selector: "#submit" }
])
→ { completed: 3, stateChange: { navigation: "/login" → "/dashboard", appeared: [...] } }
→ AI
One round-trip. State delta computed automatically.
Located in src/tools/interaction.tool.ts
interface Action {
action: 'set_value' | 'click_element' | 'tap_element' | 'navigate' | 'scroll' | 'swipe';
selector?: string;
value?: string;
url?: string;
direction?: 'up' | 'down' | 'left' | 'right';
pixels?: number;
}
interface SequenceOptions {
actions: Action[];
sessionId?: string; // For future multi-session support
stabilityMs?: number; // How long state must be unchanged (default: 500)
pollIntervalMs?: number; // How often to check stability (default: 100)
timeoutMs?: number; // Max wait for stability (default: 5000)
verbose?: boolean; // Return per-action results (default: false)
}interface StateDelta {
url?: { from: string; to: string };
title?: { from: string; to: string };
appeared: ElementSummary[];
disappeared: ElementSummary[];
changed: ElementChange[];
}
interface ElementSummary {
selector: string;
tagName: string;
text?: string; // Truncated to ~50 chars
}
interface ElementChange {
selector: string;
field: 'textContent' | 'value' | 'className';
from: string;
to: string;
}interface SequenceResult {
completed: number; // How many actions succeeded
failed?: {
index: number;
action: string;
error: string;
};
stateChange: StateDelta | null; // null if no changes detected
stabilityWaitMs: number; // How long we waited for stability
// Only if verbose: true
steps?: {
action: string;
result: 'ok' | 'error';
durationMs: number;
}[];
}After clicking a button, the page might:
- Navigate (URL change)
- Show a loading spinner
- Fetch data and render new elements
- Display an error message
We need to wait for the page to "settle" before computing the final delta.
1. Capture "before" state (elements, URL, title)
2. Execute all actions in sequence
3. Enter stability loop:
a. Capture current state
b. Compare to previous capture (using key signals)
c. If different → reset stability timer, goto 3a
d. If same → increment stability counter
e. If stable for stabilityMs → exit loop
f. If total time > timeoutMs → exit loop (timeout)
4. Capture "after" state (full element list)
5. Compute delta between "before" and "after"
6. Return result
Instead of comparing all elements on every poll (expensive), check key signals:
interface StateSignature {
url: string;
title: string;
elementCount: number;
hasLoadingIndicator: boolean; // .loading, [aria-busy="true"], .spinner, etc.
documentReady: boolean; // document.readyState === 'complete'
pendingRequests: number; // If using performance observer
}If signature unchanged for stabilityMs, the page is stable.
const LOADING_SELECTORS = [
'.loading',
'.spinner',
'[aria-busy="true"]',
'[data-loading="true"]',
'.skeleton',
'[class*="loading"]',
'[class*="spinner"]',
];If any loading indicator is visible, page is not stable (continue waiting).
src/
├── tools/
│ └── interaction.tool.ts # New file: execute_sequence tool
├── utils/
│ ├── state-capture.ts # Capture page state (elements, URL, title)
│ ├── state-diff.ts # Compute delta between two states
│ └── stability-detector.ts # Polling loop for stability detection
└── types/
└── interaction.types.ts # Shared types for sequence/delta
- Create
interaction.tool.tswith basicexecute_sequence - Implement action dispatch (reuse existing tool logic)
- Capture before/after state using
getElements - Compute simple delta (appeared/disappeared by selector)
- Implement
stability-detector.tswith polling loop - Add loading indicator detection
- Add configurable timeouts
- Handle edge cases (infinite loading, rapid changes)
- Add
changeddetection (same element, different content) - Add URL/title change tracking
- Add
verbosemode for debugging - Optimize performance (signature-based fast path)
- Add
sessionIdparameter to target specific sessions - Enable parallel sequences across sessions
- Coordinate with sub-agent architecture
execute_sequence({
actions: [
{ action: 'set_value', selector: '#email', value: '[email protected]' },
{ action: 'set_value', selector: '#password', value: 'secret123' },
{ action: 'click_element', selector: '#login-button' }
]
})Response:
{
"completed": 3,
"stateChange": {
"url": { "from": "/login", "to": "/dashboard" },
"appeared": [
{ "selector": "#welcome-message", "tagName": "h1", "text": "Welcome back!" },
{ "selector": "#user-menu", "tagName": "nav" }
],
"disappeared": [
{ "selector": "#login-form", "tagName": "form" }
],
"changed": []
},
"stabilityWaitMs": 650
}execute_sequence({
actions: [
{ action: 'set_value', selector: '#email', value: 'invalid-email' },
{ action: 'click_element', selector: '#submit' }
]
})Response:
{
"completed": 2,
"stateChange": {
"appeared": [
{ "selector": ".error-message", "tagName": "div", "text": "Please enter a valid email" }
],
"disappeared": [],
"changed": [
{ "selector": "#email", "field": "className", "from": "input", "to": "input error" }
]
},
"stabilityWaitMs": 520
}execute_sequence({
actions: [
{ action: 'set_value', selector: '#username', value: 'test' },
{ action: 'click_element', selector: '#nonexistent-button' },
{ action: 'set_value', selector: '#other-field', value: 'never reached' }
]
})Response:
{
"completed": 1,
"failed": {
"index": 1,
"action": "click_element",
"error": "Element not found: #nonexistent-button"
},
"stateChange": {
"changed": [
{ "selector": "#username", "field": "value", "from": "", "to": "test" }
]
},
"stabilityWaitMs": 100
}execute_sequence({
actions: [
{ action: 'navigate', url: 'https://example.com' },
{ action: 'click_element', selector: '#menu-toggle' },
{ action: 'click_element', selector: '#settings-link' }
],
verbose: true
})Response:
{
"completed": 3,
"stateChange": { "url": { "from": "/", "to": "/settings" }, ... },
"stabilityWaitMs": 1200,
"steps": [
{ "action": "navigate", "result": "ok", "durationMs": 450 },
{ "action": "click_element", "result": "ok", "durationMs": 85 },
{ "action": "click_element", "result": "ok", "durationMs": 92 }
]
}Some actions (like set_value) rarely cause async changes. Could skip stability check for them:
{ action: 'set_value', selector: '#name', value: 'test', skipStability: true }Concern: Adds complexity. Maybe just rely on the final stability check.
Options:
- Hard timeout (current approach) — returns partial delta
- Detect specific loading patterns — report "page still loading"
- Let AI decide — return
{ stable: false, reason: 'loading indicator visible' }
Recommendation: Timeout with diagnostic info about why we timed out.
Current getElements filters to viewport by default. For delta:
- Viewport only = might miss elements that scrolled in/out
- Full page = more accurate but larger payload
Recommendation: Full page for delta computation, but truncate to top N appeared/disappeared.
Two comparison strategies:
- Full diff: Compare all elements every poll (accurate, expensive)
- Key signals: Compare signature only during polling, full diff only at end (fast, might miss rapid changes)
Recommendation: Key signals for polling, full diff once at end.
Should we support:
{ action: 'click_element', selector: '#cookie-banner', optional: true }Concern: Scope creep. The AI can handle conditionals itself. Keep the tool simple.
execute_sequence complements existing tools:
- Simple single actions still use
click_element,set_value, etc. - Complex workflows use
execute_sequence - No breaking changes to existing tools
Works identically for mobile sessions:
execute_sequence({
actions: [
{ action: 'tap_element', selector: '~loginButton' },
{ action: 'set_value', selector: '~usernameField', value: 'test' },
{ action: 'swipe', direction: 'up' }
]
})When multi-session support lands:
execute_sequence({
sessionId: 'user-a',
actions: [...]
})