lightpanda-io · krichprollsch · Jun 18, 2026 · Jun 18, 2026
diff --git a/src/content/run-locally/commands/agent.mdx b/src/content/run-locally/commands/agent.mdx
@@ -29,7 +29,7 @@ Options:
                            When omitted, lightpanda auto-detects an API key
                            from your environment (ANTHROPIC_API_KEY,
                            OPENAI_API_KEY, GOOGLE_API_KEY/GEMINI_API_KEY,
-                           HF_TOKEN).
+                           HF_TOKEN, AI_GATEWAY_API_KEY, MISTRAL_API_KEY).
                            With exactly one key set: that provider is used.
                            With multiple keys on a TTY: you'll be prompted
                            to pick; in non-interactive contexts, pass
@@ -38,12 +38,15 @@ Options:
                            natural-language input, no LOGIN /
                            ACCEPT_COOKIES keywords).
 
-                           ollama is never auto-detected (it needs no key);
-                           select it explicitly with --provider ollama.
+                           Local servers (ollama, llama_cpp) are never
+                           auto-detected (they need no key); select them
+                           explicitly with --provider ollama / --provider
+                           llama_cpp.
 
                            Allowed values:
                              "anthropic", "openai", "gemini",
-                             "huggingface", "ollama".
+                             "huggingface", "vercel", "mistral",
+                             "ollama", "llama_cpp".
                            In the REPL, use /provider to list and change
                            providers.
 
@@ -61,6 +64,7 @@ Options:
 --base-url <URL>           Override the API base URL for the provider.
                            Defaults to the provider's standard endpoint.
                            Ollama default: http://localhost:11434/v1.
+                           llama.cpp default: http://localhost:8080/v1.
                            Hugging Face default is the serverless router
                            (https://router.huggingface.co/v1); point this
                            at a dedicated Inference Endpoint to use one.
@@ -98,8 +102,10 @@ Options:
 --effort <LEVEL>           Per-turn reasoning budget, mapped to each
                            provider's native thinking/reasoning knob.
                            Default: low in the REPL (snappy turns),
-                           medium in one-shot --task mode. In the REPL,
-                           use /effort to change it.
+                           medium in one-shot --task mode, unless the
+                           provider sets its own default (Mistral defaults
+                           to none, as its default model rejects effort).
+                           In the REPL, use /effort to change it.
 
                            Allowed values:
                              none, minimal, low, medium, high, xhigh.
@@ -108,8 +114,9 @@ The provider, model, effort, and verbosity you choose in the REPL are
 remembered per-directory in .lp-agent.zon and reused on the next run.
 
 API keys are read from the environment: ANTHROPIC_API_KEY, OPENAI_API_KEY,
-GOOGLE_API_KEY/GEMINI_API_KEY, or HF_TOKEN. Ollama does not require an API
-key.
+GOOGLE_API_KEY/GEMINI_API_KEY, HF_TOKEN, AI_GATEWAY_API_KEY, or
+MISTRAL_API_KEY. The local servers (Ollama, llama.cpp) do not require an
+API key.
 
 common options:
   --insecure-disable-tls-host-verification

diff --git a/src/content/usage/agent.mdx b/src/content/usage/agent.mdx
@@ -36,7 +36,8 @@ Set an API key for your preferred LLM provider:
 export ANTHROPIC_API_KEY=<YOUR_ANTHROPIC_KEY>
 ```
 
-Or `OPENAI_API_KEY`, `GOOGLE_API_KEY`, `HF_TOKEN`, or a local LLM through an Ollama server.
+Or `OPENAI_API_KEY`, `GOOGLE_API_KEY`, `HF_TOKEN`, `AI_GATEWAY_API_KEY`,
+`MISTRAL_API_KEY`, or a local LLM through an Ollama or llama.cpp server.
 
 Launch the REPL:
 
@@ -81,13 +82,19 @@ The agent needs an LLM to interpret natural language. Set the relevant API
 key as an environment variable, or pass `--provider` explicitly, or set `/provider`
 while on the REPL.
 
-| Provider  | Flag                   | API key env                          |
-|-----------|------------------------|--------------------------------------|
-| Anthropic | `--provider anthropic` | `ANTHROPIC_API_KEY`                  |
-| OpenAI    | `--provider openai`    | `OPENAI_API_KEY`                     |
-| Gemini    | `--provider gemini`    | `GOOGLE_API_KEY` or `GEMINI_API_KEY` |
-| Hugging Face    | `--provider huggingface`    | `HF_TOKEN`|
-| Ollama    | `--provider ollama`    | none (local)                         |
+| Provider          | Flag                     | API key env                          |
+|-------------------|--------------------------|--------------------------------------|
+| Anthropic         | `--provider anthropic`   | `ANTHROPIC_API_KEY`                  |
+| OpenAI            | `--provider openai`      | `OPENAI_API_KEY`                     |
+| Gemini            | `--provider gemini`      | `GOOGLE_API_KEY` or `GEMINI_API_KEY` |
+| Hugging Face      | `--provider huggingface` | `HF_TOKEN`                           |
+| Vercel AI Gateway | `--provider vercel`      | `AI_GATEWAY_API_KEY`                 |
+| Mistral           | `--provider mistral`     | `MISTRAL_API_KEY`                    |
+| Ollama            | `--provider ollama`      | none (local)                         |
+| llama.cpp         | `--provider llama_cpp`   | none (local)                         |
+
+Set `HF_BILL_TO` to an organization name to bill Hugging Face requests to it
+rather than to your personal account.
 
 You can set the provider explicitly with the CLI option `--provider` or the REPL command `/provider`.
 Otherwise the agent will pick one in this order:
@@ -97,11 +104,14 @@ Otherwise the agent will pick one in this order:
    set.
 2. **Auto-detected** - the first key found in priority order
    (`ANTHROPIC_API_KEY` → `GOOGLE_API_KEY`/`GEMINI_API_KEY` →
-   `OPENAI_API_KEY` → `HF_TOKEN`). With several keys on the REPL, you'll be prompted to
+   `OPENAI_API_KEY` → `HF_TOKEN` → `AI_GATEWAY_API_KEY` →
+   `MISTRAL_API_KEY`). With several keys on the REPL, you'll be prompted to
    pick.
-3. **Local** - if no cloud key is set, the agent probes
-   `http://localhost:11434/v1` (Ollama default server endpoint)
-   and uses it if there's at least one model pulled.
+3. **Local** - if no cloud key is set, the agent probes the local
+   servers (Ollama at `http://localhost:11434/v1`, llama.cpp at
+   `http://localhost:8080/v1`) and uses one if it has at least one model
+   loaded. Local servers need no key and are never auto-detected from the
+   environment, so select them with `--provider ollama` / `--provider llama_cpp`.
    You can change the server URL with the `--base-url` CLI option.
 4. **No provider at all** - If the CLI option `--no-llm` is set
    it falls back to the basic REPL (slash commands only).
@@ -118,8 +128,12 @@ The CLI option `--list-models` or pressing TAB on the REPL command `/model` prin
 The CLI option `--effort <none|minimal|low|medium|high|xhigh>` or the REPL command `/effort`
 sets the per-turn reasoning budget for thinking models.
 It maps to each provider's native reasoning-effort knob and is ignored by non-thinking models.
-The REPL defaults to `low` so turns stay snappy.
+Effort resolves in order: an explicit `--effort` flag, then the value remembered
+in `.lp-agent.zon`, then the provider's own default, then the mode default.
+The REPL mode default is `low` so turns stay snappy.
 `--task` defaults to `medium` where answer quality matters more than per-turn latency.
+A few providers set their own default that wins over the mode default;
+Mistral defaults to `none` because its default model rejects reasoning effort.
 Higher effort can mean fewer tool calls per task (the model plans better),
 so it's a real tradeoff rather than a pure slowdown.
 Effort selection persists in `.lp-agent.zon`.