Skip to content

MCP remote client has no transport-level retry on socket/connection errors #25287

@flupkede

Description

@flupkede

Problem

When a remote MCP server (type: "remote" with StreamableHTTPClientTransport) becomes temporarily unreachable — e.g. the server process restarts, the laptop suspends/resumes, or a TCP keep-alive goes stale — the MCP client has no recovery mechanism.

What happens today

  1. Server restarts → all in-memory sessions lost
  2. Client sends a tools/call@modelcontextprotocol/sdk's internal reqwest pool has a dead keep-alive connection
  3. Socket-level error occurs (not even an HTTP 404) — the request never reaches the server
  4. client.callTool() throws an error
  5. In packages/opencode/src/mcp/index.ts, the convertMcpTool execute function has no catch/retry logic — the error propagates to Effect.catch which logs it and returns undefined
  6. The MCP server is marked as "failed" and never reconnects

Why server-side middleware can't fix this

  • Socket errors never reach the server. The client's HTTP library fails before sending a request.
  • Even if the request does reach the server (e.g. HTTP 404 for stale session), the server can't tell the client's internal SDK state about a new session ID — that state lives inside @modelcontextprotocol/sdk's transport layer.

Expected behavior

When a tool call fails due to a transport error, the MCP client should:

  1. Detect that the connection is dead (socket error, ECONNRESET, etc.)
  2. Close the old transport/client
  3. Create a new transport and reconnect (re-initialize)
  4. Retry the original tool call with the new session

Suggested fix

In convertMcpTool (packages/opencode/src/mcp/index.ts), wrap the execute function with transport-level retry:

execute: async (args: unknown) => {
  try {
    return await client.callTool(
      { name: mcpTool.name, arguments: (args || {}) as Record<string, unknown> },
      CallToolResultSchema,
      { resetTimeoutOnProgress: true, timeout },
    )
  } catch (e) {
    // If this is a transport-level error, try reconnecting once
    if (isTransportError(e) && clientKey && mcpConfig) {
      log.warn("MCP transport error, attempting reconnect", { clientKey, error: e.message })
      try {
        await client.close()
        const result = await createAndStore(clientKey, { ...mcpConfig, enabled: true })
        if (result.status === "connected" && state.clients[clientKey]) {
          return await state.clients[clientKey].callTool(
            { name: mcpTool.name, arguments: (args || {}) as Record<string, unknown> },
            CallToolResultSchema,
            { resetTimeoutOnProgress: true, timeout },
          )
        }
      } catch (retryError) {
        log.error("MCP reconnect failed", { clientKey, error: retryError })
      }
    }
    throw e
  }
}

A simpler alternative: use the existing connect function to re-establish the connection, leveraging the transport fallback chain (StreamableHTTP → SSE).

Environment

  • OpenCode: latest (v1.x)
  • MCP SDK: @modelcontextprotocol/sdk
  • MCP Server: codesearch serve (streamable HTTP)
  • OS: Windows (but issue applies to all platforms — any server restart triggers it)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions