Skip to content

Add vision: pipe attached Slack images into the LLM #26

@scarolan

Description

@scarolan

Both Ollama and Gemini take images natively. Slack messages with file attachments arrive on the event payload. ~30 lines of glue unlocks a whole feature class.

Use cases

  • "Data, what's in this screenshot?"
  • "Data, what's broken about this stack trace?"
  • "Identify this Star Trek species."
  • "Translate the text in this photo."

Plan

  1. In the message handler, check message.files for image MIME types (image/png, image/jpeg, image/webp).
  2. Fetch each via app.client.files.info + the URL with the bot token, base64 encode.
  3. Extend the canonical message shape to allow images: [base64] on user turns.
  4. Adapter translation:
    • Ollama: native — { role: 'user', content, images: [...] }
    • Gemini: parts: [{ text }, { inlineData: { mimeType, data } }]
  5. Persist message text only in convoStore (not the image bytes) — assistant's reply implicitly captures what was discussed.
  6. Document in BOT_PERSONALITY that Data can see images.

Out of scope (for now)

Image generation in mid-conversation (covered by tool calling once #1 lands).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions