Interaction Sequencing Architecture Proposal

Status: Draft Date: 2026-02-05 Goal: Reduce round-trips for multi-action workflows and provide intelligent state change detection

Problem Statement

Current Flow (Inefficient)

A simple login requires 3+ tool calls:

AI → set_value(#username, "user")  → "Set value"        → AI
AI → set_value(#password, "pass")  → "Set value"        → AI
AI → click_element(#submit)        → "Clicked element"  → AI
AI → get_visible_elements()        → [elements...]      → AI

Issues:

Wasted round-trips — The AI already knows the full sequence; intermediate responses carry no information
No state feedback — Tools return "Clicked element" but not whether anything changed
Manual polling — AI must call get_visible_elements to see what happened
Latency — Each round-trip adds network + inference latency

Desired Flow

AI → execute_sequence([
       { action: "set_value", selector: "#username", value: "user" },
       { action: "set_value", selector: "#password", value: "pass" },
       { action: "click_element", selector: "#submit" }
     ])
   → { completed: 3, stateChange: { navigation: "/login" → "/dashboard", appeared: [...] } }
   → AI

One round-trip. State delta computed automatically.

Proposed Design

New Tool: `execute_sequence`

Located in src/tools/interaction.tool.ts

interface Action {
  action: 'set_value' | 'click_element' | 'tap_element' | 'navigate' | 'scroll' | 'swipe';
  selector?: string;
  value?: string;
  url?: string;
  direction?: 'up' | 'down' | 'left' | 'right';
  pixels?: number;
}

interface SequenceOptions {
  actions: Action[];
  sessionId?: string;           // For future multi-session support
  stabilityMs?: number;         // How long state must be unchanged (default: 500)
  pollIntervalMs?: number;      // How often to check stability (default: 100)
  timeoutMs?: number;           // Max wait for stability (default: 5000)
  verbose?: boolean;            // Return per-action results (default: false)
}

State Delta Structure

interface StateDelta {
  url?: { from: string; to: string };
  title?: { from: string; to: string };
  appeared: ElementSummary[];
  disappeared: ElementSummary[];
  changed: ElementChange[];
}

interface ElementSummary {
  selector: string;
  tagName: string;
  text?: string;          // Truncated to ~50 chars
}

interface ElementChange {
  selector: string;
  field: 'textContent' | 'value' | 'className';
  from: string;
  to: string;
}

Response Structure

interface SequenceResult {
  completed: number;                    // How many actions succeeded
  failed?: {
    index: number;
    action: string;
    error: string;
  };
  stateChange: StateDelta | null;       // null if no changes detected
  stabilityWaitMs: number;              // How long we waited for stability

  // Only if verbose: true
  steps?: {
    action: string;
    result: 'ok' | 'error';
    durationMs: number;
  }[];
}

Stability Detection Algorithm

Why Stability Matters

After clicking a button, the page might:

Navigate (URL change)
Show a loading spinner
Fetch data and render new elements
Display an error message

We need to wait for the page to "settle" before computing the final delta.

Algorithm

1. Capture "before" state (elements, URL, title)
2. Execute all actions in sequence
3. Enter stability loop:
   a. Capture current state
   b. Compare to previous capture (using key signals)
   c. If different → reset stability timer, goto 3a
   d. If same → increment stability counter
   e. If stable for stabilityMs → exit loop
   f. If total time > timeoutMs → exit loop (timeout)
4. Capture "after" state (full element list)
5. Compute delta between "before" and "after"
6. Return result

Key Signals for Stability Check (Fast Path)

Instead of comparing all elements on every poll (expensive), check key signals:

interface StateSignature {
  url: string;
  title: string;
  elementCount: number;
  hasLoadingIndicator: boolean;  // .loading, [aria-busy="true"], .spinner, etc.
  documentReady: boolean;        // document.readyState === 'complete'
  pendingRequests: number;       // If using performance observer
}

If signature unchanged for stabilityMs, the page is stable.

Loading Indicator Detection

const LOADING_SELECTORS = [
  '.loading',
  '.spinner',
  '[aria-busy="true"]',
  '[data-loading="true"]',
  '.skeleton',
  '[class*="loading"]',
  '[class*="spinner"]',
];

If any loading indicator is visible, page is not stable (continue waiting).

Implementation Plan

File Structure

src/
├── tools/
│   └── interaction.tool.ts      # New file: execute_sequence tool
├── utils/
│   ├── state-capture.ts         # Capture page state (elements, URL, title)
│   ├── state-diff.ts            # Compute delta between two states
│   └── stability-detector.ts    # Polling loop for stability detection
└── types/
    └── interaction.types.ts     # Shared types for sequence/delta

Phase 1: Core Sequencing

Create interaction.tool.ts with basic execute_sequence
Implement action dispatch (reuse existing tool logic)
Capture before/after state using getElements
Compute simple delta (appeared/disappeared by selector)

Phase 2: Stability Detection

Implement stability-detector.ts with polling loop
Add loading indicator detection
Add configurable timeouts
Handle edge cases (infinite loading, rapid changes)

Phase 3: Enriched Feedback

Add changed detection (same element, different content)
Add URL/title change tracking
Add verbose mode for debugging
Optimize performance (signature-based fast path)

Phase 4: Multi-Session Support

Add sessionId parameter to target specific sessions
Enable parallel sequences across sessions
Coordinate with sub-agent architecture

Usage Examples

Basic Login

execute_sequence({
  actions: [
    { action: 'set_value', selector: '#email', value: '[email protected]' },
    { action: 'set_value', selector: '#password', value: 'secret123' },
    { action: 'click_element', selector: '#login-button' }
  ]
})

Response:

{
  "completed": 3,
  "stateChange": {
    "url": { "from": "/login", "to": "/dashboard" },
    "appeared": [
      { "selector": "#welcome-message", "tagName": "h1", "text": "Welcome back!" },
      { "selector": "#user-menu", "tagName": "nav" }
    ],
    "disappeared": [
      { "selector": "#login-form", "tagName": "form" }
    ],
    "changed": []
  },
  "stabilityWaitMs": 650
}

Form with Validation Error

execute_sequence({
  actions: [
    { action: 'set_value', selector: '#email', value: 'invalid-email' },
    { action: 'click_element', selector: '#submit' }
  ]
})

Response:

{
  "completed": 2,
  "stateChange": {
    "appeared": [
      { "selector": ".error-message", "tagName": "div", "text": "Please enter a valid email" }
    ],
    "disappeared": [],
    "changed": [
      { "selector": "#email", "field": "className", "from": "input", "to": "input error" }
    ]
  },
  "stabilityWaitMs": 520
}

Action Failure Mid-Sequence

execute_sequence({
  actions: [
    { action: 'set_value', selector: '#username', value: 'test' },
    { action: 'click_element', selector: '#nonexistent-button' },
    { action: 'set_value', selector: '#other-field', value: 'never reached' }
  ]
})

Response:

{
  "completed": 1,
  "failed": {
    "index": 1,
    "action": "click_element",
    "error": "Element not found: #nonexistent-button"
  },
  "stateChange": {
    "changed": [
      { "selector": "#username", "field": "value", "from": "", "to": "test" }
    ]
  },
  "stabilityWaitMs": 100
}

Verbose Mode for Debugging

execute_sequence({
  actions: [
    { action: 'navigate', url: 'https://example.com' },
    { action: 'click_element', selector: '#menu-toggle' },
    { action: 'click_element', selector: '#settings-link' }
  ],
  verbose: true
})

Response:

{
  "completed": 3,
  "stateChange": { "url": { "from": "/", "to": "/settings" }, ... },
  "stabilityWaitMs": 1200,
  "steps": [
    { "action": "navigate", "result": "ok", "durationMs": 450 },
    { "action": "click_element", "result": "ok", "durationMs": 85 },
    { "action": "click_element", "result": "ok", "durationMs": 92 }
  ]
}

Open Questions

1. Should stability detection be configurable per-action?

Some actions (like set_value) rarely cause async changes. Could skip stability check for them:

{ action: 'set_value', selector: '#name', value: 'test', skipStability: true }

Concern: Adds complexity. Maybe just rely on the final stability check.

2. How to handle infinite loading states?

Options:

Hard timeout (current approach) — returns partial delta
Detect specific loading patterns — report "page still loading"
Let AI decide — return { stable: false, reason: 'loading indicator visible' }

Recommendation: Timeout with diagnostic info about why we timed out.

3. Should delta include off-screen elements?

Current getElements filters to viewport by default. For delta:

Viewport only = might miss elements that scrolled in/out
Full page = more accurate but larger payload

Recommendation: Full page for delta computation, but truncate to top N appeared/disappeared.

4. Performance: Full diff vs. key signals

Two comparison strategies:

Full diff: Compare all elements every poll (accurate, expensive)
Key signals: Compare signature only during polling, full diff only at end (fast, might miss rapid changes)

Recommendation: Key signals for polling, full diff once at end.

5. What about conditional actions?

Should we support:

{ action: 'click_element', selector: '#cookie-banner', optional: true }

Concern: Scope creep. The AI can handle conditionals itself. Keep the tool simple.

Compatibility

Existing Tools

execute_sequence complements existing tools:

Simple single actions still use click_element, set_value, etc.
Complex workflows use execute_sequence
No breaking changes to existing tools

Mobile Support

Works identically for mobile sessions:

execute_sequence({
  actions: [
    { action: 'tap_element', selector: '~loginButton' },
    { action: 'set_value', selector: '~usernameField', value: 'test' },
    { action: 'swipe', direction: 'up' }
  ]
})

Multi-Session (Future)

When multi-session support lands:

execute_sequence({
  sessionId: 'user-a',
  actions: [...]
})

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Interaction Sequencing Architecture Proposal

Problem Statement

Current Flow (Inefficient)

Desired Flow

Proposed Design

New Tool: `execute_sequence`

State Delta Structure

Response Structure

Stability Detection Algorithm

Why Stability Matters

Algorithm

Key Signals for Stability Check (Fast Path)

Loading Indicator Detection

Implementation Plan

File Structure

Phase 1: Core Sequencing

Phase 2: Stability Detection

Phase 3: Enriched Feedback

Phase 4: Multi-Session Support

Usage Examples

Basic Login

Form with Validation Error

Action Failure Mid-Sequence

Verbose Mode for Debugging

Open Questions

1. Should stability detection be configurable per-action?

2. How to handle infinite loading states?

3. Should delta include off-screen elements?

4. Performance: Full diff vs. key signals

5. What about conditional actions?

Compatibility

Existing Tools

Mobile Support

Multi-Session (Future)

References

Uh oh!

FilesExpand file tree

interaction-sequencing-proposal.md

Latest commit

History

interaction-sequencing-proposal.md

File metadata and controls

Interaction Sequencing Architecture Proposal

Problem Statement

Current Flow (Inefficient)

Desired Flow

Proposed Design

New Tool: execute_sequence

State Delta Structure

Response Structure

Stability Detection Algorithm

Why Stability Matters

Algorithm

Key Signals for Stability Check (Fast Path)

Loading Indicator Detection

Implementation Plan

File Structure

Phase 1: Core Sequencing

Phase 2: Stability Detection

Phase 3: Enriched Feedback

Phase 4: Multi-Session Support

Usage Examples

Basic Login

Form with Validation Error

Action Failure Mid-Sequence

Verbose Mode for Debugging

Open Questions

1. Should stability detection be configurable per-action?

2. How to handle infinite loading states?

3. Should delta include off-screen elements?

4. Performance: Full diff vs. key signals

5. What about conditional actions?

Compatibility

Existing Tools

Mobile Support

Multi-Session (Future)

References

New Tool: `execute_sequence`