Cursor Lens

Cursor Lens is a small but credible multi-agent system for engagement-weighted community signal triage. It filters sampled community posts, groups recurring topics, and produces a cautious strategic briefing with alerts and auditability.

This repository is intentionally a prototype, not a production moderation platform. The goal is to demonstrate real coordination, real safety controls, and clear thinking about how the system could fail.

What This Demo Shows

Cursor Lens treats moderation-like filtering as a signal-quality mechanism for community intelligence.

In this prototype:

not every post is treated as equally valuable evidence
engagement signals proxy for community attention
medium-confidence items are preserved for human review
only filtered posts reach trend synthesis
topic clusters must meet minimum support before they are surfaced
the final output is a strategic briefing, not just a moderation label

That is the core product claim: cleaner strategic insight comes from weighting community signals before summarization.

System Brief

Use case: triage a monthly sample of community posts and separate weak signals from more durable recurring themes.
Primary stakeholders: community managers, CMO or strategy leads, analysts, and human reviewers.
Objective: convert a noisy post sample into inspectable classifications, review decisions, topic clusters, and strategic alerts.
Failure stakes: if the system over-filters, it hides useful signal; if it under-filters, it lets noise dominate trend outputs; if it overstates weak evidence, downstream teams may act on misleading narratives.

Why MAS

A single agent is not enough here because the work mixes distinct tasks with different reliability needs:

classification should be cheap, deterministic, and easy to audit
policy routing should be explicit and rule-based
trend synthesis needs semantic grouping across many posts
executive summarization needs a different prompt, output contract, and risk framing

Splitting those responsibilities creates cleaner failure boundaries. The system can inspect and log each stage, stop after policy if needed, and compare local deterministic decisions against downstream generative interpretation.

Agent Roster

Agent / Component	Role	Tools	Memory	Permissions
CSV importer	Load and sample raw data	Python CSV parsing	none	read local dataset only
Classifier Agent	Score engagement from `score` and `num_comments`	deterministic Python	blackboard write/read	no network
Policy Agent	Route posts to pass/review/reject	deterministic Python	blackboard write/read	no network
Trend Agent	Group passed posts into topic clusters	Anthropic structured output	blackboard write/read	outbound LLM only
Lens Agent	Produce alerts and CMO briefing	Anthropic structured output	blackboard write/read	outbound LLM only
Blackboard / SQLite	Shared state and audit log	in-memory + SQLite	persistent state/log	local file only
Appeal Agent stub	Record contested policy outcomes for later review	FastAPI endpoint	blackboard write/read	no model access
Human reviewer	Inspect medium-band posts and appeals	API review endpoint	blackboard read	human judgment

The current implementation covers classifier, policy, trend, and auditor-like logging directly. escalation exists as policy-to-review routing, and appeal now exists as a lightweight stub that records contested decisions for later human handling.

Architecture

flowchart TD
    A[data/raw/reddit_books/posts.csv] --> B[ingestion/reddit_books.py]
    B --> C[CommunityPost schema]
    C --> D[Classifier Agent<br/>deterministic engagement score]
    D --> E[Policy Agent<br/>pass / review / reject]
    E -->|pass| F[Trend Agent<br/>LLM clustering]
    E -->|review| G[Review Queue]
    F --> H[Recurrence Policy<br/>4-bucket monthly rule]
    H --> I[Lens Agent<br/>LLM briefing + alerts]
    D --> J[Blackboard]
    E --> J
    F --> J
    I --> J
    J --> K[SQLite state + audit log]
    J --> L[FastAPI endpoints]

Communication Contract

The system uses a blackboard coordination pattern. Agents do not message each other directly; instead they publish typed results into shared state.

Core message shapes:

CommunityPost: normalized input record
ClassificationResult: post_id, engagement_score, engagement_tier, score_signal, comment_signal, reasoning
PolicyResult: post_id, status, passes_filter, filter_reason, policy_citation, confidence
TrendOutput: topic clusters, support share, recurrence status, dominant theme
LensOutput: health index, engagement-weighted sentiment, narrative, alerts, briefing

Routing:

importer writes normalized posts into the batch flow
classifier writes classification_<post_id>
policy reads classification and writes policy_<post_id>
review queue is derived from policy_* records with status review
trend reads only passed posts
lens reads batch summary plus trend output

Escalation:

medium engagement posts become review
review items are exposed through GET /review
contested decisions can be posted to POST /appeals

Example message contract:

{
  "post_id": "t3_example",
  "classification": {
    "engagement_score": 0.54,
    "engagement_tier": "medium",
    "score_signal": 0.61,
    "comment_signal": 0.39,
    "reasoning": "Engagement score is derived from score and num_comments using a weighted log-scale heuristic."
  },
  "policy": {
    "post_id": "t3_example",
    "status": "review",
    "passes_filter": false,
    "filter_reason": "Post sits in the medium-engagement band and stays for review.",
    "policy_citation": "Policy 1.2",
    "confidence": "medium"
  }
}

Shared state keys are persisted in blackboard.py and storage.py.

Coordination Choice

This system uses a blackboard architecture.

Why blackboard over a single supervisor:

each stage produces an inspectable artifact
state is naturally auditable and replayable
downstream agents can be changed without rewriting upstream logic
human review fits naturally as another consumer of shared state

Why not consensus or market-based coordination:

the task is sequential and evidence-accumulating, not adversarial
there is little value in agent bidding or multi-vote consensus for this small prototype
the main need is traceability, not decentralized negotiation

Incentive Analysis

The current system is cooperative, not competitive.

local objective of classifier: assign a stable engagement tier from available signals
local objective of policy: avoid contaminating trend synthesis with weak posts
local objective of trend agent: maximize topical coherence across passed posts
local objective of lens agent: maximize decision usefulness without overstating evidence
global objective: produce actionable but cautious community intelligence

Potential incentive misalignment:

a strict classifier improves precision but can lower coverage
a permissive policy increases recall but may reduce downstream quality
a summarizer can sound more confident than the evidence warrants

The current design handles that by making routing deterministic and by exposing signal quality directly in the final output.

Emergence

Expected emergent behavior:

recurring clusters may reveal themes not obvious from any one post
interaction between engagement filtering and recurrence policy can surface stronger long-horizon signals than raw popularity alone

Unwanted emergent behavior:

over-representation of high-score posts can bias the narrative toward already popular topics
medium-band posts routed to review may create blind spots if no human reviews them
Lens may produce a persuasive narrative from a weak sample unless the prompts and signal-quality warnings stay strict
repeated monthly sampling could stabilize around the sampling heuristic rather than the real community distribution

Safety And Governance

human-in-the-loop: medium-band posts go to /review
appeal capture: contested decisions can be recorded via /appeals
audit log: every write is timestamped and persisted in SQLite
rollback: local state can be wiped with /system/reset-local-state
budget safety: local spend reservations and a hard app budget exist in budget.py
bounded output: Trend and Lens use structured schema validation
sample-scope warnings: prompts require explicit mention that this is a historical sampled diagnostic

Known failure cases:

long-tailed but important low-engagement discussions are filtered out
the sample is historical and engagement-weighted, not population-representative
comment count and score are only proxies for importance, not truth or safety
appeal capture exists, but final adjudication is still human and not automated

Operations

Observability:

GET /system/health
GET /system/budget
GET /blackboard
GET /blackboard/log
GET /history/trends
CLI summary output in run_reddit_books.py

Human controls:

confirm_spend=true for paid calls
reset endpoint for local rollback
review queue for medium-band items
appeal queue for contested policy outcomes

Evaluation:

agent level: classifier tier stability, policy routing distribution
interaction level: number of passed posts, review load, trend-cluster coherence
system level: signal quality, confirmed trend count, alert usefulness, cost
human level: whether analysts agree the alerts are useful and appropriately cautious

Interoperability

This prototype is local and Python-native, but the boundaries are already clear:

classifier and policy could be exposed as internal deterministic services
trend and lens could become A2A-style specialist agents with typed contracts
blackboard storage is the natural seam for MCP-style tool access to state, logs, and review queues

The most important interoperability requirement is stable schemas, not model swapping.

MARL Bridge

Multi-agent reinforcement learning is not appropriate for the current stage.

Why not:

the task is dominated by auditability and policy clarity, not exploration
reward design would be brittle and hard to validate
human trust would likely decrease if routing policy emerged from opaque RL

Where MARL could matter later:

sampling policy optimization
reviewer-assignment scheduling
adaptive escalation thresholds under tightly controlled offline simulation

For this prototype, deterministic routing plus prompted synthesis is the better engineering choice.

Prototype Evidence

The repo contains:

a working CLI run in run_reddit_books.py
a mock simulation endpoint in main.py
local tests covering classifier, policy, orchestration, storage, time windows, and API behavior

Recent sample run:

60 sampled posts
28 passed into trend synthesis
4 supported themes in the latest live run
recurrence is still checked in the backend, but the dashboard emphasizes supported themes over strict trend counts
3 alerts
run cost under $0.07

Worked scenario:

The importer samples 60 historical r/books posts across four weekly buckets.
The classifier assigns each post high, medium, or low engagement from score and num_comments.
The policy agent sends medium posts to the review queue and excludes low posts from trend synthesis.
The trend agent clusters only the passed posts and labels every cluster as a signal or a trend based on four-bucket recurrence.
The lens agent turns those clusters into alerts and a marketing briefing while explicitly warning when evidence is weak.
In the latest run, the system found 11 topic clusters, 0 confirmed trends, and 3 alerts. That is a worked example of useful coordination without overstating confidence.

How To Experience The Demo

If you want to understand the system quickly, use the dashboard instead of the raw API docs.

Start the app.
Open http://127.0.0.1:8000/.
Click Preview Monthly Sample to inspect the historical input window.
Click Generate Intelligence View to run the classifier, policy, trend, and lens pipeline.
Read the page in this order:
- Intelligence View
- Community Health
- Alert Stack
- Appeals & Escalations
- Engagement Breakdown
- System Status

What to notice during the demo:

passed posts are fewer than sampled posts because filtering is intentional
weak micro-topics are suppressed when support is too thin
recurrence is still checked, but the product surface emphasizes supported themes
the final layer stays cautious about evidence quality

For a shorter walkthrough, see docs/demo_script.md.

Setup

uv sync
Copy-Item .env.example .env
uv run uvicorn main:app --reload

Open http://127.0.0.1:8000/docs.

For the cleaner presentation layer, open http://127.0.0.1:8000/.

Presentation-friendly endpoints:

GET /alerts returns counts plus alert items
GET /review returns a count plus presentation-friendly review items
GET /appeals returns the current appeal queue
POST /appeals records a contested decision for later human handling

Useful Commands

uv run pytest
uv run ruff check .
uv run python reset_local_state.py --confirm-reset
uv run python run_reddit_books.py --confirm-spend
uv run python run_reddit_books.py --confirm-spend --json

Dataset

The current input is posts.csv, an engagement-weighted historical r/books sample. The untouched source archive is preserved beside it as source_archive.zip.

Preview without cost:

GET /datasets/reddit-books/preview?sample_per_bucket=15

Run the paid analysis:

POST /datasets/reddit-books/run?sample_per_bucket=15&confirm_spend=true

With the current local classifier, only Trend and Lens call Anthropic during a dataset run.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.tmp_pytest/test_reset_local_state_clears_0		.tmp_pytest/test_reset_local_state_clears_0
agents		agents
data		data
docs		docs
ingestion		ingestion
tests		tests
ui		ui
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
blackboard.py		blackboard.py
budget.py		budget.py
config.py		config.py
evaluate_live.py		evaluate_live.py
evaluation.py		evaluation.py
main.py		main.py
orchestrator.py		orchestrator.py
pyproject.toml		pyproject.toml
reset_local_state.py		reset_local_state.py
run_reddit_books.py		run_reddit_books.py
schemas.py		schemas.py
storage.py		storage.py
time_windows.py		time_windows.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cursor Lens

What This Demo Shows

System Brief

Why MAS

Agent Roster

Architecture

Communication Contract

Coordination Choice

Incentive Analysis

Emergence

Safety And Governance

Operations

Interoperability

MARL Bridge

Prototype Evidence

How To Experience The Demo

Setup

Useful Commands

Dataset

Supporting Docs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Cursor Lens

What This Demo Shows

System Brief

Why MAS

Agent Roster

Architecture

Communication Contract

Coordination Choice

Incentive Analysis

Emergence

Safety And Governance

Operations

Interoperability

MARL Bridge

Prototype Evidence

How To Experience The Demo

Setup

Useful Commands

Dataset

Supporting Docs

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages