fix: propagate API error messages for auto-embeddings with empty/too-long input by rtmalikian · Pull Request #187 · manticoresoftware/columnar

rtmalikian · 2026-06-17T23:09:29Z

What was broken

When a remote embedding API (OpenAI, Voyage, Jina) returned an error for empty input or input exceeding the model's token limit, the actual error message from the API was discarded. Users would see a generic:

ERROR 1064 (42000) at line 14: Failed to parse response from remote model

Instead of the descriptive message the API actually returned, such as:

This model's maximum context length is 8192 tokens ... requested 8193

The same generic error appeared for empty string inputs (INSERT INTO test (id, abstract) VALUES (1, '')).

What the fix does

1. Pre-validation for empty inputs

All three remote model implementations (OpenAI, Voyage, Jina) now check that at least one non-empty text is provided before making an API call. This avoids unnecessary network round-trips and gives a clear local error message.

2. API error message propagation

Added a message field to the RemoteHttpError variant in LibError. When the API returns an error response with a JSON body containing error.message (OpenAI/Voyage) or error.message/detail (Jina), the actual error message is now extracted and included in the displayed error output.

3. Descriptive messages for response parsing errors

All RemoteHttpError usages in the success response parsing path now include specific messages (e.g., "response missing 'data' array", "API returned no embeddings in response") instead of just the HTTP status code.

Files changed

embeddings/src/error.rs — Added message: Option<String> to RemoteHttpError, updated Display implementation
embeddings/src/model/openai.rs — Pre-validation + error message extraction
embeddings/src/model/voyage.rs — Pre-validation + error message extraction
embeddings/src/model/jina.rs — Pre-validation + error message extraction (supports both error.message and detail formats)

Before

ERROR 1064 (42000): Failed to parse response from remote model

After

ERROR 1064 (42000): HTTP error from remote model: status code 400: This model's maximum context length is 8192 tokens, however you requested 8193 tokens (8193 in your prompt; 0 for the completion)

For empty input:

ERROR 1064 (42000): HTTP error from remote model: status code 400: all input texts are empty - at least one non-empty text is required for embeddings

Verification

Reviewed all usages of RemoteHttpError across the codebase (28 instances across 3 model files)
Updated every instance to include the message field
Verified no test files construct RemoteHttpError directly (tests use the error indirectly through model operations)
The PartialEq, Eq, and Hash derives on LibError work correctly with Option<String>

About the Author: Raphael Malikian — Clinical AI Solutions Architect. I specialise in building and fixing AI/ML systems for healthcare, including vector databases, RAG pipelines, and clinical NLP. If you need help with your project or think I can add value to your organisation, feel free to reach out — I'd love to connect.

📧 [email protected]
🔗 GitHub: https://github.com/rtmalikian
🔗 LinkedIn: http://www.linkedin.com/in/raphael-t-malikian-mbbs-bsc-hons-71075436a

Disclosure: This code was developed with assistance from mimo-v2.5-pro (Xiaomi) via Hermes Agent (Nous Research). All changes were reviewed, tested against the actual codebase, and verified for correctness.

…long input When a remote embedding API (OpenAI, Voyage, Jina) returns an error for empty input or input exceeding the model's token limit, the actual error message from the API was being discarded. Users would see a generic 'HTTP error from remote model: status code 400' instead of the descriptive message like 'maximum context length is 8192 tokens'. Changes: - Add 'message' field to RemoteHttpError variant to carry API error details - Extract error message from API JSON response (error.message for OpenAI/Voyage, error.message or detail for Jina) - Include API error message in displayed error output - Add pre-validation for empty inputs before making API calls across all three remote model implementations (OpenAI, Voyage, Jina) - Add descriptive messages for all RemoteHttpError usages in the success response parsing path Fixes manticoresoftware#143

CLAassistant · 2026-06-17T23:09:35Z

All committers have signed the CLA.

sanikolaev · 2026-06-19T03:36:28Z

Thanks for the PR @rtmalikian
We'll test and review it.

donhardman · 2026-06-20T11:01:14Z

Hey @rtmalikian

Thanks for the PR. Let's fix the conflicts and also formatting with cargo fmt and any other clippy issues and LGTM once CI passes

sanikolaev requested a review from donhardman June 19, 2026 03:36

donhardman approved these changes Jun 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: propagate API error messages for auto-embeddings with empty/too-long input#187

fix: propagate API error messages for auto-embeddings with empty/too-long input#187
rtmalikian wants to merge 1 commit into
manticoresoftware:masterfrom
rtmalikian:fix/better-error-messages-auto-embeddings

rtmalikian commented Jun 17, 2026

Uh oh!

CLAassistant commented Jun 17, 2026 •

edited

Loading

Uh oh!

sanikolaev commented Jun 19, 2026

Uh oh!

donhardman commented Jun 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

rtmalikian commented Jun 17, 2026

What was broken

What the fix does

1. Pre-validation for empty inputs

2. API error message propagation

3. Descriptive messages for response parsing errors

Files changed

Before

After

Verification

Uh oh!

CLAassistant commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sanikolaev commented Jun 19, 2026

Uh oh!

donhardman commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CLAassistant commented Jun 17, 2026 •

edited

Loading

donhardman commented Jun 20, 2026 •

edited

Loading