Both Ollama and Gemini take images natively. Slack messages with file attachments arrive on the event payload. ~30 lines of glue unlocks a whole feature class.
Use cases
- "Data, what's in this screenshot?"
- "Data, what's broken about this stack trace?"
- "Identify this Star Trek species."
- "Translate the text in this photo."
Plan
- In the message handler, check
message.files for image MIME types (image/png, image/jpeg, image/webp).
- Fetch each via
app.client.files.info + the URL with the bot token, base64 encode.
- Extend the canonical message shape to allow
images: [base64] on user turns.
- Adapter translation:
- Ollama: native —
{ role: 'user', content, images: [...] }
- Gemini:
parts: [{ text }, { inlineData: { mimeType, data } }]
- Persist message text only in convoStore (not the image bytes) — assistant's reply implicitly captures what was discussed.
- Document in
BOT_PERSONALITY that Data can see images.
Out of scope (for now)
Image generation in mid-conversation (covered by tool calling once #1 lands).
Both Ollama and Gemini take images natively. Slack messages with file attachments arrive on the event payload. ~30 lines of glue unlocks a whole feature class.
Use cases
Plan
message.filesfor image MIME types (image/png,image/jpeg,image/webp).app.client.files.info+ the URL with the bot token, base64 encode.images: [base64]on user turns.{ role: 'user', content, images: [...] }parts: [{ text }, { inlineData: { mimeType, data } }]BOT_PERSONALITYthat Data can see images.Out of scope (for now)
Image generation in mid-conversation (covered by tool calling once #1 lands).