Investigate rate limiting on MCP API calls after authentication

## Summary

Customers using Cursor Automations with the Sentry MCP server are hitting rate limit errors shortly after authenticating. Reports include: all API calls being rate-limited within a few minutes of auth, and automation runs being blocked with `"The automation was rate-limited due to too many concurrent runs."`

## Symptoms

- Rate limit errors triggered within minutes of initial authentication
- Affects all API calls (not isolated to a specific endpoint)
- Observed in concurrent/parallel automation run scenarios (e.g. Cursor Automations)
- Error: `"The automation was rate-limited due to too many concurrent runs. Retry after a short delay or reduce the number of parallel automation runs."`

## How Rate Limiting Is Implemented

Rate limiting runs in the Cloudflare Worker (`packages/mcp-cloudflare`) via Cloudflare's native `RateLimit` binding. There are two independent layers:

### Layer 1: IP-based (pre-auth)
- Applied to all `/mcp` and `/oauth` routes **before** OAuth processing
- Binding: `MCP_IP_RATE_LIMITER` (fallback: `MCP_RATE_LIMITER`)
- Key: `mcp:ip:<sha256-of-ip>[0:16]`
- Threshold: **300 requests / 60 seconds** per IP
- Source: `src/server/index.ts`

### Layer 2: Per-user (post-auth)
- Applied after OAuth token validation, inside the MCP handler
- Binding: `MCP_USER_RATE_LIMITER` (fallback: `MCP_RATE_LIMITER`)
- Key: `mcp:user:<sha256-of-userId>[0:16]`
- User ID = **Sentry user ID** (from OAuth token, `payload.user.id`) — shared across all clients/tokens for the same Sentry account
- Threshold: **60 requests / 60 seconds** per user
- Source: `src/server/lib/mcp-handler.ts`

### Key implementation details
- Rate limit keys use the first 16 hex chars of SHA-256 of the identifier (privacy-preserving, but with a ~1/10^19 collision risk — negligible)
- **Both bindings ARE defined in `wrangler.jsonc`** for prod and canary — they are wired. However the namespace IDs in the file (`1001`–`1004`, `2001`–`2004`) are local dev mock IDs. Production bindings must be configured via Cloudflare dashboard and should override these.
- If the rate limiter binding is unavailable (e.g. local dev), requests are **allowed** by default (fail-open)
- On rate limiter errors, requests are also **allowed** (fail-open)
- The legacy `MCP_RATE_LIMITER` binding is still used as a fallback in code but is **not present in `wrangler.jsonc`** — if not deployed in prod, both layers transparently fall back to allow-all

### Rate Limit Config (`wrangler.jsonc` — prod and canary use separate namespace IDs but identical limits)

| Binding | Limit | Period | Scope |
|---|---|---|---|
| `MCP_IP_RATE_LIMITER` | 300 req | 60s | per IP |
| `MCP_USER_RATE_LIMITER` | 60 req | 60s | per Sentry user ID |
| `CHAT_RATE_LIMITER` | 10 req | 60s | chat routes |
| `SEARCH_RATE_LIMITER` | 20 req | 60s | search routes |

## Why Cursor Automations Are Likely Hitting This

Cursor Automations can spawn **multiple concurrent runs**, each of which fires multiple MCP tool calls. All runs from the same authenticated Sentry user share the same `MCP_USER_RATE_LIMITER` bucket (keyed by Sentry user ID). At **60 req/60s**, a user running 3–4 parallel automation runs with moderate tool use can saturate the limit within seconds. Cloudflare's rate limits are **fixed-window**, not sliding — this means burst behavior at window boundaries can make it feel more aggressive than the numbers suggest.

The IP-based limit (300 req/60s) is less likely to be the culprit for single-user scenarios but could be a factor for shared corporate egress (NAT/proxy).

## Sentry Instrumentation Gap

**There are currently no Sentry events or metrics emitted when a rate limit is hit.** The 429 response is returned directly with no `captureException`, `captureEvent`, or custom metric. The `sentryBeforeSend` hook only handles scrubbing + fingerprinting — it does not add rate limit visibility. Cloudflare's built-in observability (`observability.enabled: true`) may capture request-level data, but we have no Sentry-side signal to alert on, trend, or correlate with user reports.

**We should add Sentry instrumentation at both rate limit check points** to track:
- Which limit was hit (IP vs user)
- The rate-limited identifier (hashed, already done in the key)
- Frequency / volume of 429s over time

## Investigation Still Needed

- [ ] Confirm the actual Cloudflare namespace IDs for prod bindings — the wrangler.jsonc values are dev mock IDs
- [ ] Verify whether `MCP_USER_RATE_LIMITER` and `MCP_IP_RATE_LIMITER` are actually deployed in the Cloudflare dashboard for prod (if not, the fallback to `MCP_RATE_LIMITER` would make the limits undefined/allow-all)
- [ ] Add Sentry instrumentation (metric or event) when rate limits are hit — at minimum log with `console.warn` so it's captured via `consoleLoggingIntegration`, ideally a custom metric or breadcrumb
- [ ] Evaluate whether 60 req/60s is the right threshold for agentic/automation use cases — this was sized for interactive human usage
- [ ] Consider whether automation clients should get a higher limit or a separate bucket (e.g. keyed by `clientId + userId`, or elevated for `?agent=1` requests — the query param already exists)
- [ ] Document rate limits publicly so users understand the constraints

## Expected Behavior

Users should not hit rate limits during normal Cursor Automation usage. If limits are intentional, they should be clearly documented and provide actionable guidance in the error response.

## Workaround

Reduce number of parallel automation runs; retry after a short delay.

_Action taken on behalf of David Cramer._

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate rate limiting on MCP API calls after authentication #844

Summary

Symptoms

How Rate Limiting Is Implemented

Layer 1: IP-based (pre-auth)

Layer 2: Per-user (post-auth)

Key implementation details

Rate Limit Config (`wrangler.jsonc` — prod and canary use separate namespace IDs but identical limits)

Why Cursor Automations Are Likely Hitting This

Sentry Instrumentation Gap

Investigation Still Needed

Expected Behavior

Workaround

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Binding	Limit	Period	Scope
`MCP_IP_RATE_LIMITER`	300 req	60s	per IP
`MCP_USER_RATE_LIMITER`	60 req	60s	per Sentry user ID
`CHAT_RATE_LIMITER`	10 req	60s	chat routes
`SEARCH_RATE_LIMITER`	20 req	60s	search routes

Investigate rate limiting on MCP API calls after authentication #844

Description

Summary

Symptoms

How Rate Limiting Is Implemented

Layer 1: IP-based (pre-auth)

Layer 2: Per-user (post-auth)

Key implementation details

Rate Limit Config (wrangler.jsonc — prod and canary use separate namespace IDs but identical limits)

Why Cursor Automations Are Likely Hitting This

Sentry Instrumentation Gap

Investigation Still Needed

Expected Behavior

Workaround

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions

Rate Limit Config (`wrangler.jsonc` — prod and canary use separate namespace IDs but identical limits)