feat: Telegram file/image support, long-term memory, parallel ad-hoc subagents, zero-trust verifier by Copilot · Pull Request #31 · chinkan/RustFox

Copilot · 2026-03-25T10:16:42Z

Features

Telegram File/Image Support

Telegram only supported text messages. This adds handling for photos and file attachments (PDF, DOCX, images), routing them through a vision/OCR/document extraction pipeline before injecting context into the LLM.

IncomingMessage gains attachments: Vec<Attachment>
telegram.rs handles msg.photo() and msg.document() with per-request temp dirs
ChatMessage.content promoted from Option<String> to MessageContent (Text or Parts for multi-modal)
Images: if model supports vision → base64 multi-modal; otherwise → OCR via ocrs (pure Rust, no system deps)
PDF: text extraction via pdf-extract
DOCX: text extraction via docx-rs
Long content (>6000 chars): chunked (1000-char overlap), stored via EmbeddingEngine + sqlite-vec, RAG-retrieved per query

Long-Term Memory & Startup/Shutdown

Conversations can be soft-archived instead of deleted
/clear now archives (searchable but excluded from active context)
McpManager gains server_count() for status
Bot sends startup and shutdown notifications to a designated chat
Graceful shutdown with signal handling

Parallel Ad-Hoc Subagents

spawn_agents tool: spawn subagents with inline system prompts — no AGENT.md needed
spawn_agents(tasks=[{system_prompt, prompt}, ...]) — parallel batch via tokio::join_all
Multiple spawn_agents or invoke_agent calls in one LLM response run concurrently
System context (date/time, user model, location) auto-injected into all subagents
Removed deprecated invoke_subagent — use invoke_agent with skill fallback

Zero-Trust Verifier

Predefined agent at agents/verifier/AGENT.md with skip_bootstrap: true
Invoked via invoke_agent(agent="verifier", prompt="TASK:...\nCRITERIA:...\nEVIDENCE:...")
Read-only sandbox access: read_file, list_files, plan_view — no write tools
Returns structured <evaluation>PASS/NEEDS_IMPROVEMENT/FAIL</evaluation>
Verification Protocol added to system prompt (verify before ending, iterate on rejection)
Agent discovery notes: system prompt tells LLM not to search for agent files via list_files

`skip_bootstrap` Frontmatter

New skip_bootstrap: true flag in AGENT.md/SKILL.md frontmatter
When set, the agent/skill body is used directly as the system message — no "read your instructions" bootstrap step
Enables simple evaluator agents like the verifier

Context Compaction Improvements

Second-pass hard cap: if still >100KB after compaction, reduces preserved tool groups from 2 to 1
Image preservation: compact_tool_result now preserves MessageContent::Parts (image parts survive)
Assistant text compaction: long text-only assistant messages (>1KB) now compacted with preview
Retry optimization: prepare_messages_for_llm moved outside retry loops (compaction is invariant)
Naming fix: estimate_prompt_chars → estimate_prompt_bytes (measures bytes, not chars)
System prompt warning: logs a warning when system prompt exceeds 50KB
Marker wording: "characters" → "bytes" in compacted markers

Code Quality

Removed dead AgentKind enum (only Agent variant remained)
Added #![deny(dead_code)] to lib.rs to prevent future dead code

New Dependencies

Crate	Purpose
`ocrs`	OCR fallback for non-vision models (pure Rust)
`rten`	Neural network inference runtime for ocrs
`image`	Image loading/decoding
`pdf-extract`	PDF text extraction
`docx-rs`	DOCX parsing
`infer`	MIME type detection from bytes
`base64`	Base64 encoding for vision API

chinkan · 2026-03-25T10:30:15Z

@copilot use ocrs https://github.com/robertknight/ocrs as ocr image instead of terract

…rs OCR Co-authored-by: chinkan <[email protected]> Agent-Logs-Url: https://github.com/chinkan/RustFox/sessions/9c502a20-dd1e-47bb-8b4d-34cf2f26c7f1

Co-authored-by: Copilot <[email protected]>

Add MessageContent enum supporting Text(String) and Parts(Vec<ContentPart>) to replace Option<String> in ChatMessage.content. Backwards-compatible: Text variant serializes as plain JSON string, Parts as JSON array. - Add ContentPart, ImageUrlContent, MessageContent types to llm.rs - Update ChatMessage.content: Option<String> → Option<MessageContent> - Fix all usage sites in agent.rs, memory/conversations.rs, memory/query_rewriter.rs, memory/summarizer.rs, memory/rag.rs - DB storage continues to use extracted plain text via as_text() - All 113 tests pass Co-authored-by: Copilot <[email protected]>

- Add `supports_vision` bool field to `OpenRouterConfig` (default false) - Add `OcrConfig` struct with `model_dir` (default $HOME/.cache/ocrs) - Add `ocr` field to `Config` with serde default - Add 3 tests: supports_vision defaults false, parses true, ocr default dir - Update config.example.toml with comments for new fields Co-authored-by: Copilot <[email protected]>

Co-authored-by: Copilot <[email protected]>

- Add src/file_processor/mod.rs with process_attachments() entry point - Image handling: base64 vision parts (supports_vision=true) or OCR via ocrs - OCR: auto-downloads text-detection/recognition .rten models on first use - PDF: text extraction via pdf_extract::extract_text_from_mem - DOCX: text extraction by traversing docx_rs document children - Long context (>6000 chars): chunk+store in MemoryStore, RAG-retrieve - Unit tests for chunk_text (4 cases) - Register module via mod file_processor; in src/main.rs Co-authored-by: Copilot <[email protected]>

…comment on pdf unwrap_or_default Co-authored-by: Copilot <[email protected]>

- Replace early-return for non-text messages with caption fallback - Add download_telegram_file helper (creates temp dir, fetches via Telegram API, detects MIME with infer crate) - Add classify_attachment_kind and mime_from_extension helpers - Collect photo and document attachments into Vec<Attachment> - Pass attachments to IncomingMessage instead of empty vec - Clean up temp dir after agent processing completes - Add 5 unit tests for classify_attachment_kind Co-authored-by: Copilot <[email protected]>

Co-authored-by: Copilot <[email protected]>

…match Co-authored-by: chinkan <[email protected]> Agent-Logs-Url: https://github.com/chinkan/RustFox/sessions/9c502a20-dd1e-47bb-8b4d-34cf2f26c7f1

Copilot · 2026-03-25T11:21:56Z

@copilot use ocrs https://github.com/robertknight/ocrs as ocr image instead of terract

Switched OCR from tesseract to ocrs (https://github.com/robertknight/ocrs) — pure Rust, neural-network-based, no system library dependencies. Implementation in src/file_processor/mod.rs uses ocrs::OcrEngine with rten::Model via spawn_blocking. Models are auto-downloaded from S3 on first use to ~/.cache/ocrs/. Committed in 12e9337.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

docs.rs
- Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

Copilot

Pull request overview

Adds Telegram media (photos + documents) support by downloading attachments to temp storage, extracting usable text (OCR/PDF/DOCX) and/or sending images as multimodal content to vision-capable models, with long-document fallback to the existing knowledge/vector store.

Changes:

Extend platform message model with typed attachments and add Telegram download + cleanup logic for photos/documents.
Promote LLM ChatMessage.content to a backwards-compatible multimodal representation (text or parts).
Add a file processing pipeline (vision base64, OCR via ocrs, PDF/DOCX extraction, long-context chunking + RAG retrieval) and wire it into the agent.

Reviewed changes

Copilot reviewed 17 out of 19 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
src/supervisor/backend/reasoning.rs	Populates new `attachments` field for supervisor-originated messages.
src/platform/telegram.rs	Downloads Telegram photos/documents into temp dirs, classifies attachments, cleans up after processing, adds unit tests.
src/platform/mod.rs	Introduces `AttachmentKind`, `Attachment`, and `IncomingMessage.attachments`.
src/memory/summarizer.rs	Adapts summarizer to new `MessageContent` and `as_text()` extraction.
src/memory/rag.rs	Updates RAG snippet building to use `MessageContent::as_text()`.
src/memory/query_rewriter.rs	Updates query rewriter to build prompts from `MessageContent::as_text()`.
src/memory/conversations.rs	Stores/retrieves text-only content while mapping DB rows to `MessageContent::Text`.
src/llm.rs	Adds multimodal content types (`MessageContent`, `ContentPart`) with backward-compatible serialization + tests.
src/lib.rs	Exposes new `file_processor` module.
src/learning.rs	Updates learning flows to use `MessageContent` consistently.
src/file_processor/mod.rs	New attachment processing module: vision/OCR, PDF/DOCX extraction, chunking + knowledge-store indexing/retrieval.
src/config.rs	Adds `openrouter.supports_vision` and `[ocr].model_dir` defaults + tests.
src/agent.rs	Integrates attachment processing into message construction and persistence behavior.
src/agent_prompt.rs	Adjusts prompt estimation/compaction logic for `MessageContent`.
docs/plans/2026-03-25-telegram-file-image-support.md	Adds an implementation plan documenting the intended architecture and dependencies.
config.example.toml	Documents new `supports_vision` and `[ocr]` settings.
Cargo.toml	Adds new crates for OCR/image/PDF/DOCX/mime/base64 support.
Cargo.lock	Locks new transitive dependencies for added crates.
.gitignore	Updates ignored DB filename pattern to `rustfox.db*`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    let dest_name = match filename {
+        Some(n) => n.to_string(),
+        None => format!("{}.{}", uuid::Uuid::new_v4(), ext),
+    };
+    let dest_path = dest_dir.join(&dest_name);


+    let ext = Path::new(&file.path)
+        .extension()
+        .and_then(|e| e.to_str())
+        .unwrap_or("bin");
+


+    dest_dir: &Path,
+    filename: Option<&str>,
+) -> Result<(PathBuf, String)> {
+    std::fs::create_dir_all(dest_dir).context("Failed to create temp directory")?;


+        .await
+        .context("Failed to download file from Telegram")?;
+
+    std::fs::write(&dest_path, &bytes).context("Failed to write downloaded file")?;


+    let response = reqwest::get(url)
+        .await
+        .context("Failed to fetch OCR model")?;
+    let bytes = response
+        .bytes()
+        .await
+        .context("Failed to read OCR model bytes")?;


+        // Save a text-only version to DB (avoid storing base64 image data in message history)
+        let db_content = if incoming.attachments.is_empty() {
+            user_msg_content.clone()
+        } else {
+            let mut db_text = incoming.text.clone();
+            if !attachment_text.is_empty() {
+                db_text.push_str("\n\n[Attachment processed]");
+            }
+            MessageContent::from_text(db_text)
+        };


+# Used to extract text from images when supports_vision = false.
+# Models are downloaded automatically on first use.
+# [ocr]
+# model_dir = "~/.cache/ocrs"  # Where OCR model files are cached (downloaded on first use)


+    tokio::task::spawn_blocking(move || -> Result<String> {
+        let detection_model =
+            rten::Model::load_file(&det_path).context("Failed to load OCR detection model")?;
+        let recognition_model =
+            rten::Model::load_file(&rec_path).context("Failed to load OCR recognition model")?;


+                match extract_pdf_text(&attachment.path) {
+                    Ok(text) => {
+                        let ctx = handle_context_length(&text, fname, user_query, memory).await;
+                        text_parts.push(ctx);
+                    }
+                    Err(e) => {
+                        tracing::warn!("PDF extraction failed: {}", e);
+                        text_parts.push(format!("[PDF processing failed: {}]", e));
+                    }
+                }


+                match extract_docx_text(&attachment.path) {
+                    Ok(text) => {
+                        let ctx = handle_context_length(&text, fname, user_query, memory).await;
+                        text_parts.push(ctx);
+                    }
+                    Err(e) => {
+                        tracing::warn!("DOCX extraction failed: {}", e);
+                        text_parts.push(format!("[DOCX processing failed: {}]", e));
+                    }
+                }


…tifications

…rguments Switches compaction marker format from JSON to plain-text (COMPACTION_MARKER_PREFIX) and adds defensive detection in both the main agent loop and subagent tool dispatch. Also updates agent model references to opencode-go/* variants.

Split run_subagent into two paths: - Ad-hoc mode (skill_name = None): use system_prompt + user_prompt directly with a default sandbox tool whitelist and ambient system context (timestamp, user model, location) auto-injected. - Predefined mode: existing registry lookup, with new support for skip_bootstrap and YAML frontmatter stripping from the agent body. The shared mini-agentic loop (LLM call -> empty-response recovery -> tool execution -> final response) is now extracted into a reusable run_subagent_loop helper used by both paths. The 'Agent {}' log prefixes are unified to 'Subagent {}' to match the new generic label. The invoke_agent tool handler is updated to pass the new signature.

…fixes - Add spawn_agents tool for ad-hoc inline subagents (no AGENT.md needed) - Parallel execution: multiple subagent calls run concurrently via join_all - System context (date/time, user model, location) auto-injected into subagents - Remove deprecated invoke_subagent (use invoke_agent with skill fallback) - Add skip_bootstrap frontmatter flag for AGENT.md direct body injection - Add agents/verifier/AGENT.md with read-only sandbox access (read_file, list_files, plan_view) - Add Verification Protocol to system prompt (verify work before ending) - Add agent discovery notes to system prompt (DO NOT list agent directories) - Add SubagentsConfig with default_tools in config - Compaction: fix bytes vs chars naming, preserve image parts in tool results - Compaction: add second-pass hard cap (100KB) to prevent context overflow - Compaction: move prepare_messages_for_llm outside retry loops - Compaction: compact long assistant text-only messages - Compaction: fix marker wording (bytes not characters)

- Remove AgentKind enum (only Agent variant remained after invoke_subagent removal) - Remove kind parameter from run_subagent signature - Simplify match blocks to plain code blocks - Remove #[allow(dead_code)] from build_subagent_system_prompt (now used) - Add #![deny(dead_code)] to lib.rs to prevent future dead code

…ier, compaction - Add File & Image Support, Long-Context RAG, Long-Term Memory features - Add Ad-Hoc Subagents, Zero-Trust Verifier to feature list - Update agent tools table (add spawn_agents, remove invoke_subagent) - Add file_processor/ and agent_prompt.rs to architecture diagram - Add agents/verifier/ to directory tree - Update Roadmap with completed items

Prepares src/bin/setup.rs for the setup wizard redesign by introducing TOML parse structs that mirror the expanded config surface used by the main bot (agent, langsmith, embedding, ocr, learning, supervisor, subagents, skills, agents_config). - Add RawAgent, RawLangSmith, RawEmbedding, RawOcr, RawLearning, RawSupervisor (with RawSupervisorRisk), RawSubagents, RawSkills, RawAgentsConfig — all Clone, all fields optional - Extend RawConfig with the new optional sections - Extend RawOpenRouter with base_url + supports_vision - Extend RawMemory with query_rewriter_enabled - Extend RawGeneral with home (consumed by load_config) - Extend ExistingConfig with the 19 new fields surfaced to the wizard - Wire load_config to populate the new fields from parsed sections CI-clean: cargo fmt --check, cargo clippy -- -D warnings, and all 15 existing tests pass.

Restructure the setup wizard from 7 flat steps to 4 wizard steps plus a success page, with collapsible Advanced sections and a global 'Show all settings' toggle. This is Task 3 of the wizard redesign (see docs/superpowers/plans/2026-06-11-setup-wizard-redesign.md). Step layout: 1. Bot & LLM Setup — required bot token, user IDs, OR key + Advanced: model, prompt, vision, etc. 2. Location & Storage — location, sandbox/db/skills/agents dirs + Advanced: query rewriting, agent loop 3. Integrations — MCP catalog (moved from step 5) + Advanced: LangSmith, Embedding, Learning 4. Review & Save — summary of all settings + raw TOML preview 5. Success — done page State, validation, collect, loadExistingConfig, and generateToml all extended to cover the new fields (supports_vision, base_url, skills_dir, agents_dir, home_dir, ocr_model_dir, agent_max_iterations, agent_empty_ response_retry_limit, langsmith_*, embedding_*, query_rewriter_enabled, learning_*). Preserved verbatim: - Dark theme CSS, MCP_CATALOG (18 servers), OAuth popup flow (Notion + Google Workspace + Threads modals), esc() helper, TOML output for all [[mcp_servers]] sections, /api/save-config and /api/load-config endpoints.

…th resolution - Rewrite setup/index.html with progressive disclosure (4 steps + success) - Add global 'Show all settings' toggle for tech users - Add collapsible advanced sections with inline validation - Add syncStateToForm() to load existing config into wizard fields - Resolve config path from RUSTFOX_HOME → RUSTFOX_ROOT → ~/.rustfox → cwd - Add --advanced flag to CLI wizard for full settings control - Extend ConfigParams/format_config with all config sections - Remove sandbox.allowed_directory from wizard (auto-resolved to workspace) - Add Raw* parse structs for all config sections - Add 24 new tests for format_config sections - Add spec and plan documents

Initial plan

7f8a451

Copilot AI assigned Copilot and chinkan Mar 25, 2026

Copilot started work on behalf of chinkan March 25, 2026 10:16 View session

Copilot AI changed the title ~~[WIP] Add support for processing files and images from Telegram~~ feat: Add Telegram file/image support with vision, OCR, and document-to-context pipeline Mar 25, 2026

Copilot AI requested a review from chinkan March 25, 2026 10:21

Copilot finished work on behalf of chinkan March 25, 2026 10:21

Copilot started work on behalf of chinkan March 25, 2026 10:30 View session

Copilot AI and others added 12 commits March 25, 2026 10:39

docs: add implementation plan for telegram file/image support with oc…

73e65a2

…rs OCR Co-authored-by: chinkan <[email protected]> Agent-Logs-Url: https://github.com/chinkan/RustFox/sessions/9c502a20-dd1e-47bb-8b4d-34cf2f26c7f1

feat: add file processing dependencies

cbb559c

Co-authored-by: Copilot <[email protected]>

feat: add Attachment type to IncomingMessage

24f5a41

Co-authored-by: Copilot <[email protected]>

fix: use to_string_lossy() in ocr test for UTF-8 safety

3c969f9

Co-authored-by: Copilot <[email protected]>

refactor: avoid double chars().count() in handle_context_length; add …

1215e53

…comment on pdf unwrap_or_default Co-authored-by: Copilot <[email protected]>

fix: use tokio::fs::remove_dir_all for async temp dir cleanup

3fc8d16

Co-authored-by: Copilot <[email protected]>

feat: agent processes file attachments

1f1fbb2

Co-authored-by: Copilot <[email protected]>

test: add MessageContent serialization tests, fix clippy collapsible_…

12e9337

…match Co-authored-by: chinkan <[email protected]> Agent-Logs-Url: https://github.com/chinkan/RustFox/sessions/9c502a20-dd1e-47bb-8b4d-34cf2f26c7f1

Copilot finished work on behalf of chinkan March 25, 2026 11:23

chinkan added 2 commits June 4, 2026 11:12

Merge branch 'main' into copilot/add-telegram-file-image-support

41d1d7b

refactor: update content handling in ChatMessage and related modules

11e26cd

chinkan requested a review from Copilot June 4, 2026 03:20

Copilot started reviewing on behalf of chinkan June 4, 2026 03:20 View session

Copilot AI reviewed Jun 4, 2026

View reviewed changes

chinkan added 2 commits June 4, 2026 12:12

feat(memory): add is_archived column to conversations

811ad95

feat(memory): filter archived conversations in get_or_create

42fecfb

chinkan added 16 commits June 4, 2026 12:20

feat(memory): soft archive on clear_conversation instead of delete

9e31c8f

feat(memory): skip archived conversations in load_messages

7c5e472

feat(mcp): add server_count() method

7d22149

feat(telegram): update /clear text to reflect archive behavior

3abb1c5

test(memory): verify archived messages remain searchable

47d6e92

feat(telegram): add startup and shutdown notification functions

eb1c4af

feat(main): add graceful shutdown with Telegram notification

c5d2e43

style: fix formatting across multiple files

98e92a3

fix(embeddings): pass dimensions to API and validate response

3323fa3

style: fix formatting in embeddings.rs

bf78e7a

feat: implement long-term memory soft archive and startup/shutdown no…

aabee38

…tifications

fix fmt issue

06873dd

chinkan marked this pull request as ready for review June 11, 2026 06:46

chinkan changed the title ~~feat: Add Telegram file/image support with vision, OCR, and document-to-context pipeline~~ feat: Telegram file/image support, long-term memory, parallel ad-hoc subagents, zero-trust verifier Jun 11, 2026

chinkan added 4 commits June 11, 2026 15:15

chinkan merged commit 0ac336a into main Jun 12, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Telegram file/image support, long-term memory, parallel ad-hoc subagents, zero-trust verifier#31

feat: Telegram file/image support, long-term memory, parallel ad-hoc subagents, zero-trust verifier#31
chinkan merged 37 commits into
mainfrom
copilot/add-telegram-file-image-support

Copilot AI commented Mar 25, 2026 •

edited by chinkan

Loading

Uh oh!

chinkan commented Mar 25, 2026

Uh oh!

Copilot AI commented Mar 25, 2026 •

edited

Loading

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Copilot AI commented Mar 25, 2026 • edited by chinkan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Features

Telegram File/Image Support

Long-Term Memory & Startup/Shutdown

Parallel Ad-Hoc Subagents

Zero-Trust Verifier

skip_bootstrap Frontmatter

Context Compaction Improvements

Code Quality

New Dependencies

Uh oh!

chinkan commented Mar 25, 2026

Uh oh!

Copilot AI commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Mar 25, 2026 •

edited by chinkan

Loading

`skip_bootstrap` Frontmatter

Copilot AI commented Mar 25, 2026 •

edited

Loading