feat(db): capture raw model output + accurate failure messages by Colinho22 · Pull Request #72 · Colinho22/maestro

Colinho22 · 2026-06-20T22:55:34Z

Summary

A tier-3 cross-provider check surfaced that failed cells discarded the model
output that caused the failure (e.g. gemini-3.5-flash returning malformed JSON
on large extractions), leaving only the error string. The actual output was
only recoverable from the provider's own console. This PR persists it.

Changes

raw_response column on run_results and sub_results: the unprocessed
model output, retained even when the cell fails (when output_diagram_code /
output_text is None) and across retries. Populated by all five providers
and all three multi-step strategies. This makes every failure analysable
after the run without re-calling the model.
Accurate empty-output failures: a provider returning success=False with
no error string (an empty/blank diagram) is now recorded as "empty output
from provider" instead of the misleading "No attempts executed".

Design notes

raw_response stores the model text only, not the request envelope. HTTP
status (2xx/4xx/5xx) is deliberately omitted: the SDKs do not expose it on the
200-success path, and the failures of interest (malformed content) are HTTP
200s, so a status column would be inferred and misleading. raw_response plus
the existing error string fully capture each failure.
Schema is code; a fresh DB gets the columns automatically. The full 6000-cell
DB is ~70-90 MB (current is 2.3 MB for 382 cells), well within SQLite's range.

Testing

New test_raw_response_survives_a_failed_cell: a failed cell round-trips
through the DB keeping its raw output while the cleaned output is None.
Full suite: 259 passed. ruff clean.

Note

This is a schema change. An existing DB without the column will reject inserts
(init_db uses CREATE TABLE IF NOT EXISTS and will not alter an existing
table), so the pre-change DB must be moved aside before a run. This coincides
with the version-bump re-baseline already required by the RC contract changes.

Summary by CodeRabbit

New Features
- The system now captures and stores unprocessed AI provider responses for enhanced post-failure analysis and debugging. This allows access to raw outputs (including malformed content) when standard processing fails.
Tests
- Added test coverage to validate raw response persistence across database operations and retry scenarios.

A failed cell previously kept only the error string; the model output that caused the failure (malformed JSON, broken Mermaid) was discarded, so a failure could not be inspected after the run. Add a nullable raw_response column to run_results and sub_results, populated by every provider and retained across retries even when the cleaned output is None. Also name empty-output provider failures accurately instead of "No attempts executed". raw_response is the model text only (no request envelope); HTTP status is omitted because the SDKs do not expose it on the 200-success path where the malformed-content failures occur. ~70-90MB for the full matrix DB.

coderabbitai · 2026-06-20T22:55:47Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2e9df8bf-ef21-4de7-9b59-db60a3ef878a

📥 Commits

Reviewing files that changed from the base of the PR and between 557ff65 and 9d284ef.

📒 Files selected for processing (11)

src/maestro/db/client.py
src/maestro/db/queries.py
src/maestro/providers/anthropic.py
src/maestro/providers/gemini.py
src/maestro/providers/mistral.py
src/maestro/providers/openai.py
src/maestro/schemas.py
src/maestro/strategies/crew.py
src/maestro/strategies/langgraph.py
src/maestro/strategies/sop.py
tests/db/test_client.py

📝 Walkthrough

Walkthrough

Adds a raw_response: str | None field to both RunResult and SubResult schemas to preserve unprocessed LLM output for post-failure analysis. All four providers set this field on the success path; all three strategy retry loops track the latest raw output via a last_raw accumulator and attach it to SubResult on both success and failure returns. The field is persisted to the corresponding SQLite tables with inline schema documentation and a new round-trip integration test.

Changes

raw_response diagnostic field end-to-end

Layer / File(s)	Summary
Schema contracts: raw_response on RunResult and SubResult `src/maestro/schemas.py`	`RunResult` and `SubResult` each gain a nullable `raw_response: str \| None` field documenting that it holds unprocessed provider text when output parsing fails.
Provider success paths populate raw_response `src/maestro/providers/anthropic.py`, `src/maestro/providers/gemini.py`, `src/maestro/providers/mistral.py`, `src/maestro/providers/openai.py`	All four providers add `raw_response=output` to the success-path `RunResult` construction; error handling paths are unchanged.
Strategy retry loops track last_raw and wire SubResult.raw_response `src/maestro/strategies/crew.py`, `src/maestro/strategies/langgraph.py`, `src/maestro/strategies/sop.py`	Each strategy's step-execution method introduces a `last_raw` accumulator updated every attempt, propagated into `SubResult.raw_response` on both success and final failure. LangGraph and SOP also add the fallback error string `"empty output from provider"` when `result.error` is absent.
DB schema docs, query persistence, and integration test `src/maestro/db/client.py`, `src/maestro/db/queries.py`, `tests/db/test_client.py`	`SCHEMA` gains inline comments documenting `raw_response` in both tables; `insert_run_result` and `insert_sub_result` extend their `INSERT` column and parameter lists; a new test asserts `raw_response` survives a full write/read cycle while cleaned output columns remain `None`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Colinho22/maestro#34: Follows the same pattern of adding a new field (retry_count) to RunResult, extending provider complete() returns, and updating insert_run_result in queries.py — the same files and architectural path modified here for raw_response.

Poem

🐇 When the JSON arrives in a mangled heap,
No output to parse, no diagram to keep,
I tuck the raw text in a field of its own,
So even in failure, the truth can be shown.
raw_response lives on — no clue left unsown! 🌿

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(db): capture raw model output + accurate failure messages' directly and concisely summarizes the main changes: adding raw_response capture to the database and improving failure message accuracy across providers.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Docstring Coverage (Src Only)	✅ Passed	All 33 public module-level entities across changed src/ files have docstrings (100% coverage), exceeding the 80% threshold.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix-tier3-issues

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Colinho22 added this to the 🧪 Experimental Artifact milestone Jun 20, 2026

Colinho22 self-assigned this Jun 20, 2026

Colinho22 added the enhancement New feature or request label Jun 20, 2026

Colinho22 merged commit da31f8e into main Jun 20, 2026
2 checks passed

Colinho22 deleted the fix-tier3-issues branch June 20, 2026 23:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(db): capture raw model output + accurate failure messages#72

feat(db): capture raw model output + accurate failure messages#72
Colinho22 merged 1 commit into
mainfrom
fix-tier3-issues

Colinho22 commented Jun 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 20, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Colinho22 commented Jun 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Design notes

Testing

Note

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Colinho22 commented Jun 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 20, 2026 •

edited

Loading