feat(db): capture raw model output + accurate failure messages#72
Conversation
A failed cell previously kept only the error string; the model output that caused the failure (malformed JSON, broken Mermaid) was discarded, so a failure could not be inspected after the run. Add a nullable raw_response column to run_results and sub_results, populated by every provider and retained across retries even when the cleaned output is None. Also name empty-output provider failures accurately instead of "No attempts executed". raw_response is the model text only (no request envelope); HTTP status is omitted because the SDKs do not expose it on the 200-success path where the malformed-content failures occur. ~70-90MB for the full matrix DB.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (11)
📝 WalkthroughWalkthroughAdds a Changesraw_response diagnostic field end-to-end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
A tier-3 cross-provider check surfaced that failed cells discarded the model
output that caused the failure (e.g. gemini-3.5-flash returning malformed JSON
on large extractions), leaving only the error string. The actual output was
only recoverable from the provider's own console. This PR persists it.
Changes
raw_responsecolumn onrun_resultsandsub_results: the unprocessedmodel output, retained even when the cell fails (when
output_diagram_code/output_textis None) and across retries. Populated by all five providersand all three multi-step strategies. This makes every failure analysable
after the run without re-calling the model.
success=Falsewithno error string (an empty/blank diagram) is now recorded as "empty output
from provider" instead of the misleading "No attempts executed".
Design notes
raw_responsestores the model text only, not the request envelope. HTTPstatus (2xx/4xx/5xx) is deliberately omitted: the SDKs do not expose it on the
200-success path, and the failures of interest (malformed content) are HTTP
200s, so a status column would be inferred and misleading.
raw_responseplusthe existing
errorstring fully capture each failure.DB is ~70-90 MB (current is 2.3 MB for 382 cells), well within SQLite's range.
Testing
test_raw_response_survives_a_failed_cell: a failed cell round-tripsthrough the DB keeping its raw output while the cleaned output is None.
Note
This is a schema change. An existing DB without the column will reject inserts
(
init_dbuses CREATE TABLE IF NOT EXISTS and will not alter an existingtable), so the pre-change DB must be moved aside before a run. This coincides
with the version-bump re-baseline already required by the RC contract changes.
Summary by CodeRabbit
New Features
Tests