Cursor Lens is a small but credible multi-agent system for engagement-weighted community signal triage. It filters sampled community posts, groups recurring topics, and produces a cautious strategic briefing with alerts and auditability.
This repository is intentionally a prototype, not a production moderation platform. The goal is to demonstrate real coordination, real safety controls, and clear thinking about how the system could fail.
Cursor Lens treats moderation-like filtering as a signal-quality mechanism for community intelligence.
In this prototype:
- not every post is treated as equally valuable evidence
- engagement signals proxy for community attention
- medium-confidence items are preserved for human review
- only filtered posts reach trend synthesis
- topic clusters must meet minimum support before they are surfaced
- the final output is a strategic briefing, not just a moderation label
That is the core product claim: cleaner strategic insight comes from weighting community signals before summarization.
- Use case: triage a monthly sample of community posts and separate weak signals from more durable recurring themes.
- Primary stakeholders: community managers, CMO or strategy leads, analysts, and human reviewers.
- Objective: convert a noisy post sample into inspectable classifications, review decisions, topic clusters, and strategic alerts.
- Failure stakes: if the system over-filters, it hides useful signal; if it under-filters, it lets noise dominate trend outputs; if it overstates weak evidence, downstream teams may act on misleading narratives.
A single agent is not enough here because the work mixes distinct tasks with different reliability needs:
- classification should be cheap, deterministic, and easy to audit
- policy routing should be explicit and rule-based
- trend synthesis needs semantic grouping across many posts
- executive summarization needs a different prompt, output contract, and risk framing
Splitting those responsibilities creates cleaner failure boundaries. The system can inspect and log each stage, stop after policy if needed, and compare local deterministic decisions against downstream generative interpretation.
| Agent / Component | Role | Tools | Memory | Permissions |
|---|---|---|---|---|
| CSV importer | Load and sample raw data | Python CSV parsing | none | read local dataset only |
| Classifier Agent | Score engagement from score and num_comments |
deterministic Python | blackboard write/read | no network |
| Policy Agent | Route posts to pass/review/reject | deterministic Python | blackboard write/read | no network |
| Trend Agent | Group passed posts into topic clusters | Anthropic structured output | blackboard write/read | outbound LLM only |
| Lens Agent | Produce alerts and CMO briefing | Anthropic structured output | blackboard write/read | outbound LLM only |
| Blackboard / SQLite | Shared state and audit log | in-memory + SQLite | persistent state/log | local file only |
| Appeal Agent stub | Record contested policy outcomes for later review | FastAPI endpoint | blackboard write/read | no model access |
| Human reviewer | Inspect medium-band posts and appeals | API review endpoint | blackboard read | human judgment |
The current implementation covers classifier, policy, trend, and
auditor-like logging directly. escalation exists as policy-to-review
routing, and appeal now exists as a lightweight stub that records contested
decisions for later human handling.
flowchart TD
A[data/raw/reddit_books/posts.csv] --> B[ingestion/reddit_books.py]
B --> C[CommunityPost schema]
C --> D[Classifier Agent<br/>deterministic engagement score]
D --> E[Policy Agent<br/>pass / review / reject]
E -->|pass| F[Trend Agent<br/>LLM clustering]
E -->|review| G[Review Queue]
F --> H[Recurrence Policy<br/>4-bucket monthly rule]
H --> I[Lens Agent<br/>LLM briefing + alerts]
D --> J[Blackboard]
E --> J
F --> J
I --> J
J --> K[SQLite state + audit log]
J --> L[FastAPI endpoints]
The system uses a blackboard coordination pattern. Agents do not message each other directly; instead they publish typed results into shared state.
Core message shapes:
CommunityPost: normalized input recordClassificationResult:post_id,engagement_score,engagement_tier,score_signal,comment_signal,reasoningPolicyResult:post_id,status,passes_filter,filter_reason,policy_citation,confidenceTrendOutput: topic clusters, support share, recurrence status, dominant themeLensOutput: health index, engagement-weighted sentiment, narrative, alerts, briefing
Routing:
- importer writes normalized posts into the batch flow
- classifier writes
classification_<post_id> - policy reads classification and writes
policy_<post_id> - review queue is derived from
policy_*records with statusreview - trend reads only passed posts
- lens reads batch summary plus trend output
Escalation:
mediumengagement posts becomereview- review items are exposed through
GET /review - contested decisions can be posted to
POST /appeals
Example message contract:
{
"post_id": "t3_example",
"classification": {
"engagement_score": 0.54,
"engagement_tier": "medium",
"score_signal": 0.61,
"comment_signal": 0.39,
"reasoning": "Engagement score is derived from score and num_comments using a weighted log-scale heuristic."
},
"policy": {
"post_id": "t3_example",
"status": "review",
"passes_filter": false,
"filter_reason": "Post sits in the medium-engagement band and stays for review.",
"policy_citation": "Policy 1.2",
"confidence": "medium"
}
}Shared state keys are persisted in blackboard.py and storage.py.
This system uses a blackboard architecture.
Why blackboard over a single supervisor:
- each stage produces an inspectable artifact
- state is naturally auditable and replayable
- downstream agents can be changed without rewriting upstream logic
- human review fits naturally as another consumer of shared state
Why not consensus or market-based coordination:
- the task is sequential and evidence-accumulating, not adversarial
- there is little value in agent bidding or multi-vote consensus for this small prototype
- the main need is traceability, not decentralized negotiation
The current system is cooperative, not competitive.
- local objective of classifier: assign a stable engagement tier from available signals
- local objective of policy: avoid contaminating trend synthesis with weak posts
- local objective of trend agent: maximize topical coherence across passed posts
- local objective of lens agent: maximize decision usefulness without overstating evidence
- global objective: produce actionable but cautious community intelligence
Potential incentive misalignment:
- a strict classifier improves precision but can lower coverage
- a permissive policy increases recall but may reduce downstream quality
- a summarizer can sound more confident than the evidence warrants
The current design handles that by making routing deterministic and by exposing signal quality directly in the final output.
Expected emergent behavior:
- recurring clusters may reveal themes not obvious from any one post
- interaction between engagement filtering and recurrence policy can surface stronger long-horizon signals than raw popularity alone
Unwanted emergent behavior:
- over-representation of high-score posts can bias the narrative toward already popular topics
- medium-band posts routed to review may create blind spots if no human reviews them
- Lens may produce a persuasive narrative from a weak sample unless the prompts and signal-quality warnings stay strict
- repeated monthly sampling could stabilize around the sampling heuristic rather than the real community distribution
- human-in-the-loop: medium-band posts go to
/review - appeal capture: contested decisions can be recorded via
/appeals - audit log: every write is timestamped and persisted in SQLite
- rollback: local state can be wiped with
/system/reset-local-state - budget safety: local spend reservations and a hard app budget exist in budget.py
- bounded output: Trend and Lens use structured schema validation
- sample-scope warnings: prompts require explicit mention that this is a historical sampled diagnostic
Known failure cases:
- long-tailed but important low-engagement discussions are filtered out
- the sample is historical and engagement-weighted, not population-representative
- comment count and score are only proxies for importance, not truth or safety
- appeal capture exists, but final adjudication is still human and not automated
Observability:
GET /system/healthGET /system/budgetGET /blackboardGET /blackboard/logGET /history/trends- CLI summary output in run_reddit_books.py
Human controls:
confirm_spend=truefor paid calls- reset endpoint for local rollback
- review queue for medium-band items
- appeal queue for contested policy outcomes
Evaluation:
- agent level: classifier tier stability, policy routing distribution
- interaction level: number of passed posts, review load, trend-cluster coherence
- system level: signal quality, confirmed trend count, alert usefulness, cost
- human level: whether analysts agree the alerts are useful and appropriately cautious
This prototype is local and Python-native, but the boundaries are already clear:
- classifier and policy could be exposed as internal deterministic services
- trend and lens could become A2A-style specialist agents with typed contracts
- blackboard storage is the natural seam for MCP-style tool access to state, logs, and review queues
The most important interoperability requirement is stable schemas, not model swapping.
Multi-agent reinforcement learning is not appropriate for the current stage.
Why not:
- the task is dominated by auditability and policy clarity, not exploration
- reward design would be brittle and hard to validate
- human trust would likely decrease if routing policy emerged from opaque RL
Where MARL could matter later:
- sampling policy optimization
- reviewer-assignment scheduling
- adaptive escalation thresholds under tightly controlled offline simulation
For this prototype, deterministic routing plus prompted synthesis is the better engineering choice.
The repo contains:
- a working CLI run in run_reddit_books.py
- a mock simulation endpoint in main.py
- local tests covering classifier, policy, orchestration, storage, time windows, and API behavior
Recent sample run:
60sampled posts28passed into trend synthesis4supported themes in the latest live run- recurrence is still checked in the backend, but the dashboard emphasizes supported themes over strict trend counts
3alerts- run cost under
$0.07
Worked scenario:
- The importer samples 60 historical
r/booksposts across four weekly buckets. - The classifier assigns each post
high,medium, orlowengagement fromscoreandnum_comments. - The policy agent sends
mediumposts to the review queue and excludeslowposts from trend synthesis. - The trend agent clusters only the passed posts and labels every cluster as a
signalor atrendbased on four-bucket recurrence. - The lens agent turns those clusters into alerts and a marketing briefing while explicitly warning when evidence is weak.
- In the latest run, the system found
11topic clusters,0confirmed trends, and3alerts. That is a worked example of useful coordination without overstating confidence.
If you want to understand the system quickly, use the dashboard instead of the raw API docs.
- Start the app.
- Open
http://127.0.0.1:8000/. - Click
Preview Monthly Sampleto inspect the historical input window. - Click
Generate Intelligence Viewto run the classifier, policy, trend, and lens pipeline. - Read the page in this order:
Intelligence ViewCommunity HealthAlert StackAppeals & EscalationsEngagement BreakdownSystem Status
What to notice during the demo:
- passed posts are fewer than sampled posts because filtering is intentional
- weak micro-topics are suppressed when support is too thin
- recurrence is still checked, but the product surface emphasizes supported themes
- the final layer stays cautious about evidence quality
For a shorter walkthrough, see docs/demo_script.md.
uv sync
Copy-Item .env.example .env
uv run uvicorn main:app --reloadOpen http://127.0.0.1:8000/docs.
For the cleaner presentation layer, open http://127.0.0.1:8000/.
Presentation-friendly endpoints:
GET /alertsreturns counts plus alert itemsGET /reviewreturns a count plus presentation-friendly review itemsGET /appealsreturns the current appeal queuePOST /appealsrecords a contested decision for later human handling
uv run pytest
uv run ruff check .
uv run python reset_local_state.py --confirm-reset
uv run python run_reddit_books.py --confirm-spend
uv run python run_reddit_books.py --confirm-spend --jsonThe current input is posts.csv,
an engagement-weighted historical r/books sample. The untouched source archive
is preserved beside it as source_archive.zip.
Preview without cost:
GET /datasets/reddit-books/preview?sample_per_bucket=15
Run the paid analysis:
POST /datasets/reddit-books/run?sample_per_bucket=15&confirm_spend=true
With the current local classifier, only Trend and Lens call Anthropic during a dataset run.