Add Ollama client tests

## 🧠 Context

The repository layer has tests (`tests/infrastructure/db/`), but the rest of the service code doesn't yet. The Ollama client in `src/infrastructure/llm.py` is a good target: it's small, stable (the Ollama HTTP contract isn't changing), and easy to test by mocking the HTTP call so no Ollama instance is needed. This ticket adds unit tests for both of its functions:

* **`embed()`** — has the real logic: the task-specific **prefix** it prepends, and the **embedding-dimension validation**.
* **`chat()`** — a thin passthrough, but worth a basic regression test on its request payload and response parsing.

---

## 🛠 Implementation Plan

1. Create `tests/infrastructure/test_llm.py` (the `tests/infrastructure/` package already exists).

2. Mock the HTTP call so the tests don't touch the network or a real Ollama instance. One detail to get right: `llm.py` uses the client as `async with httpx.AsyncClient(...) as client:` and then `await client.post(...)`, but it calls `response.json()` and `response.raise_for_status()` directly — those are **not** awaited. So in the mock:
   * the client, and the `__aenter__` from the `async with`, must be **async** mocks (`AsyncMock`), because they're awaited;
   * the response object must be a **sync** mock (`MagicMock`), because `.json()` and `.raise_for_status()` are called directly.

   If the response is made an `AsyncMock` instead, `response.json()` returns a coroutine rather than your data and the assertions won't match. Wrap this setup in a small **module-level helper function** so each test stays short — use a plain helper, not a `conftest.py` fixture, since there are only about five tests. Everything needed is in the standard-library `unittest.mock` (`AsyncMock`, `MagicMock`, `patch`) — don't add a test dependency. Reference: [unittest.mock.AsyncMock](https://docs.python.org/3/library/unittest.mock.html#unittest.mock.AsyncMock).

3. Cover these behaviors in tests (and optionally any others you think should be included):
   * **`embed` prefix:** with `is_query=True` the `prompt` in the payload starts with the query prefix; with `is_query=False` it starts with the document prefix. **Assert against the constants** `llm.EMBED_QUERY_PREFIX` / `llm.EMBED_DOCUMENT_PREFIX`, not the literal `"search_query: "` strings — asserting the literal just duplicates the source and breaks whenever the prefixes are changed. Then add **one** separate assertion pinning `EMBED_DOCUMENT_PREFIX == "search_document: "` to lock the nomic prefix value in a single place.
   * **`embed` endpoint + model:** `embed` posts to `/api/embeddings` using the configured **embedding** model. This is symmetric with the chat test and confirms it isn't accidentally sending the chat model.
   * **`embed` dimension validation:** an embedding whose length equals `settings.embedding_dim` is returned unchanged; a wrong-length one raises `ValueError`. Assert **only** that `ValueError` is raised (optionally `pytest.raises(ValueError, match="mismatch")`) — do **not** pin the full error message; it's worded for humans and may be reworded. Size the fake embeddings off `settings.embedding_dim` (the repository tests do the same).
   * **`chat` payload + parsing:** the request payload contains the configured chat model and `stream=False`, and the function returns the response's `message.content`. This guards the payload shape and the response-parsing key.
   * **HTTP errors propagate:** with the response's `raise_for_status` set to `side_effect=httpx.HTTPStatusError(...)`, the call raises — confirming neither `embed` nor `chat` swallows HTTP errors.

---

## 📝 Notes

* `is_query` is **keyword-only** (note the `*` in `embed`'s signature) — call `embed(text, is_query=True)`, never positionally.
* Match the repo's async-test convention: `asyncio_mode` is `"auto"` (see `pyproject.toml`), so a plain `async def test_...` works with **no `@pytest.mark.asyncio` marker** — exactly how the existing repository tests do it. You mock `httpx` instead of using a DB fixture.
* **This test needs no Docker or Ollama** (it mocks the network). The full `make test` still needs Docker for the existing DB tests, so run that before pushing.
* **Do not** test malformed-response handling (a missing `embedding` / `message` key): there's no graceful path by design, so such a test would only assert `KeyError`, which isn't worth it.
* Pure addition — **no changes to `src/infrastructure/llm.py`**. If you find you need to tweak `llm.py` to make it testable, keep the change minimal and call it out in the PR.

---

## ✅ Acceptance Criteria

* `tests/infrastructure/test_llm.py` exists and covers — for `embed`: prefix asserted via the `EMBED_*` constants (plus one assertion pinning `EMBED_DOCUMENT_PREFIX`), posts to `/api/embeddings` with the embedding model, correct-dimension embedding returned, wrong-dimension raises `ValueError` (message not pinned); for `chat`: payload uses the configured chat model with `stream=False` and `message.content` is returned; plus an HTTP-error-propagates test covering both functions.
* Tests mock `httpx` — no real network calls; runnable without Ollama.
* No new dependencies added.
* `make test` and `make lint` pass.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly