Skip to content

Investigate rate limiting on MCP API calls after authentication #844

@sentry-junior

Description

@sentry-junior

Summary

Customers using Cursor Automations with the Sentry MCP server are hitting rate limit errors shortly after authenticating. Reports include: all API calls being rate-limited within a few minutes of auth, and automation runs being blocked with "The automation was rate-limited due to too many concurrent runs."

Symptoms

  • Rate limit errors triggered within minutes of initial authentication
  • Affects all API calls (not isolated to a specific endpoint)
  • Observed in concurrent/parallel automation run scenarios (e.g. Cursor Automations)
  • Error: "The automation was rate-limited due to too many concurrent runs. Retry after a short delay or reduce the number of parallel automation runs."

How Rate Limiting Is Implemented

Rate limiting runs in the Cloudflare Worker (packages/mcp-cloudflare) via Cloudflare's native RateLimit binding. There are two independent layers:

Layer 1: IP-based (pre-auth)

  • Applied to all /mcp and /oauth routes before OAuth processing
  • Binding: MCP_IP_RATE_LIMITER (fallback: MCP_RATE_LIMITER)
  • Key: mcp:ip:<sha256-of-ip>[0:16]
  • Threshold: 300 requests / 60 seconds per IP
  • Source: src/server/index.ts

Layer 2: Per-user (post-auth)

  • Applied after OAuth token validation, inside the MCP handler
  • Binding: MCP_USER_RATE_LIMITER (fallback: MCP_RATE_LIMITER)
  • Key: mcp:user:<sha256-of-userId>[0:16]
  • User ID = Sentry user ID (from OAuth token, payload.user.id) — shared across all clients/tokens for the same Sentry account
  • Threshold: 60 requests / 60 seconds per user
  • Source: src/server/lib/mcp-handler.ts

Key implementation details

  • Rate limit keys use the first 16 hex chars of SHA-256 of the identifier (privacy-preserving, but with a ~1/10^19 collision risk — negligible)
  • Both bindings ARE defined in wrangler.jsonc for prod and canary — they are wired. However the namespace IDs in the file (10011004, 20012004) are local dev mock IDs. Production bindings must be configured via Cloudflare dashboard and should override these.
  • If the rate limiter binding is unavailable (e.g. local dev), requests are allowed by default (fail-open)
  • On rate limiter errors, requests are also allowed (fail-open)
  • The legacy MCP_RATE_LIMITER binding is still used as a fallback in code but is not present in wrangler.jsonc — if not deployed in prod, both layers transparently fall back to allow-all

Rate Limit Config (wrangler.jsonc — prod and canary use separate namespace IDs but identical limits)

Binding Limit Period Scope
MCP_IP_RATE_LIMITER 300 req 60s per IP
MCP_USER_RATE_LIMITER 60 req 60s per Sentry user ID
CHAT_RATE_LIMITER 10 req 60s chat routes
SEARCH_RATE_LIMITER 20 req 60s search routes

Why Cursor Automations Are Likely Hitting This

Cursor Automations can spawn multiple concurrent runs, each of which fires multiple MCP tool calls. All runs from the same authenticated Sentry user share the same MCP_USER_RATE_LIMITER bucket (keyed by Sentry user ID). At 60 req/60s, a user running 3–4 parallel automation runs with moderate tool use can saturate the limit within seconds. Cloudflare's rate limits are fixed-window, not sliding — this means burst behavior at window boundaries can make it feel more aggressive than the numbers suggest.

The IP-based limit (300 req/60s) is less likely to be the culprit for single-user scenarios but could be a factor for shared corporate egress (NAT/proxy).

Sentry Instrumentation Gap

There are currently no Sentry events or metrics emitted when a rate limit is hit. The 429 response is returned directly with no captureException, captureEvent, or custom metric. The sentryBeforeSend hook only handles scrubbing + fingerprinting — it does not add rate limit visibility. Cloudflare's built-in observability (observability.enabled: true) may capture request-level data, but we have no Sentry-side signal to alert on, trend, or correlate with user reports.

We should add Sentry instrumentation at both rate limit check points to track:

  • Which limit was hit (IP vs user)
  • The rate-limited identifier (hashed, already done in the key)
  • Frequency / volume of 429s over time

Investigation Still Needed

  • Confirm the actual Cloudflare namespace IDs for prod bindings — the wrangler.jsonc values are dev mock IDs
  • Verify whether MCP_USER_RATE_LIMITER and MCP_IP_RATE_LIMITER are actually deployed in the Cloudflare dashboard for prod (if not, the fallback to MCP_RATE_LIMITER would make the limits undefined/allow-all)
  • Add Sentry instrumentation (metric or event) when rate limits are hit — at minimum log with console.warn so it's captured via consoleLoggingIntegration, ideally a custom metric or breadcrumb
  • Evaluate whether 60 req/60s is the right threshold for agentic/automation use cases — this was sized for interactive human usage
  • Consider whether automation clients should get a higher limit or a separate bucket (e.g. keyed by clientId + userId, or elevated for ?agent=1 requests — the query param already exists)
  • Document rate limits publicly so users understand the constraints

Expected Behavior

Users should not hit rate limits during normal Cursor Automation usage. If limits are intentional, they should be clearly documented and provide actionable guidance in the error response.

Workaround

Reduce number of parallel automation runs; retry after a short delay.

Action taken on behalf of David Cramer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions