UN-3393 [FEAT] Wire AuditSerializer subclasses through SanitizedSerializerMixin by chandrasekharan-zipstack · Pull Request #1966 · Zipstack/unstract

chandrasekharan-zipstack · 2026-05-14T07:28:17Z

What

Builds on UN-3393 [FEAT] Add SanitizedSerializerMixin foundation for input validation #1965 (foundation PR) by routing every AuditSerializer subclass through SanitizedSerializerMixin. ~18 write-path serializers now auto-reject HTML/JS-shaped input on every writable CharField / TextField.
Affected (transitively): WorkflowSerializer, CustomToolSerializer, APIDeploymentSerializer, APIKeySerializer, ConnectorInstanceSerializer, BaseAdapterSerializer, PlatformKeySerializer, PlatformApiKey{Create,Update}Serializer, PipelineSerializer, ToolInstanceSerializer, ToolStudioPromptSerializer, PromptStudioOutputSerializer, ProfileManagerSerializer, IndexManagerSerializer, PromptStudioRegistry{,Info}Serializer, PromptStudioDocumentManagerSerializer.
Adds Meta.html_safe_fields opt-outs on the two serializers that legitimately carry LLM markup:
- ToolStudioPromptSerializer → prompt, assert_prompt, assertion_failure_prompt, output.
- CustomToolSerializer → summarize_prompt, preamble, postamble, output.
Removes the three redundant validate_description methods (in api_v2.APIDeploymentSerializer, workflow_v2.WorkflowSerializer, prompt_studio_core_v2.CustomToolSerializer) — the mixin now covers them.

Why

The foundation PR (UN-3393 [FEAT] Add SanitizedSerializerMixin foundation for input validation #1965) added SanitizedSerializerMixin and pre-mixed ModelSerializer / Serializer / HyperlinkedModelSerializer under utils.serializer but did not wire any production serializer through them. This PR is the smallest, highest-leverage wiring step: changing one parent class (AuditSerializer) closes the input-validation gap on the majority of write-path entities.
validate_<name> methods (which use validate_name_field) are kept because they also strip whitespace and reject empty values — behaviour the mixin doesn't duplicate. Only the validate_description methods (which just call validate_no_html_tags) are removed.
Free-form LLM prompt / output fields (prompt, output, summarize_prompt, etc.) legitimately contain <context> / <thinking> / arbitrary model output. They're explicitly opted out via Meta.html_safe_fields.

How

backend/backend/serializers.py: AuditSerializer now inherits utils.serializer.ModelSerializer instead of rest_framework.serializers.ModelSerializer. create / update semantics unchanged.
Per-serializer Meta.html_safe_fields declarations on the two LLM-bearing serializers (Prompt Studio).
Three validate_description removals + validate_no_html_tags import drop from the affected modules.

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

Functional: The 8 serializers that were already hand-wired (in the in-flight fix) continue to behave identically. The 14 manual validate_<field> methods still exist — the mixin's validator is appended to the field's validators list, so the same rejection is enforced either way. Removing only the redundant validate_description methods avoids double-validation noise but doesn't change any acceptance outcome.
New rejections: AuditSerializer subclasses that were not previously sanitised now auto-reject HTML in their CharField / TextField (e.g. ConnectorInstance.connector_name, PlatformKey.key_name, Pipeline.pipeline_name, AdapterInstance.adapter_name, etc.). Per the May 2026 US-prod scan (see KB Obsidian Vault/zipstuff/UN-3393-input-validation/05-prod-baseline.md), zero legitimate user content uses <…> in these columns — every match was pentest leftover. Migration risk: ~zero.
LLM fields explicitly opted out: prompt, assert_prompt, assertion_failure_prompt, output, summarize_prompt, preamble, postamble are listed in Meta.html_safe_fields so they continue to accept arbitrary text.
Direct DRF-base-class users: Serializers that inherit directly from serializers.ModelSerializer / serializers.Serializer / serializers.HyperlinkedModelSerializer (i.e. not via AuditSerializer) are not touched by this PR. They remain unchanged. A follow-up sweep will convert them; their absence here means write paths through them retain pre-PR behaviour.

Database Migrations

None.

Env Config

None.

Relevant Docs

KB: Obsidian Vault/zipstuff/UN-3393-input-validation/.
ADR 0003 (adr/0003-base-mixin-default-on-opt-out.md) — opt-out semantics.

Related Issues or PRs

Jira: UN-3393.
Stacked on: UN-3393 [FEAT] Add SanitizedSerializerMixin foundation for input validation #1965 (foundation).
Follow-up: a separate PR that converts the remaining ~20 direct DRF-base-class users (tags/serializers.py, scheduler/serializer.py, tenant_account_v2/serializer.py, pipeline_v2/serializers/{internal,update,sharing,execute}.py, response-only and list serializers across the codebase, …).
Follow-up: cover backend/pluggable_apps/ (lives in the unstract-cloud repo; separate PR there).

Dependencies Versions

None.

Notes on Testing

Local tests in the OSS worktree (cloud-deps not installed, so manage.py check is unavailable here — CI runs the full suite):

cd backend && uv run pytest utils/tests/ -q

→ 85 passed (50 from foundation + 35 pre-existing utils tests).

Manual sanity:

cd backend && uv run python -c "from utils.serializer import ModelSerializer; from backend.serializers import AuditSerializer; assert issubclass(AuditSerializer, ModelSerializer); print('ok')"

Screenshots

n/a (no UI change)

Checklist

I have read and understood the Contribution Guidelines.

coderabbitai · 2026-05-14T07:28:23Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0768e501-3352-43d4-b1a2-5e2f40325838

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/UN-3393-input-validation-sweep

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-05-14T07:33:01Z

Greptile Summary

This PR wires AuditSerializer (the shared base for ~18 write-path serializers) through SanitizedSerializerMixin, so HTML/JS-shaped input is auto-rejected on every writable CharField/TextField without per-field wiring. Two serializers that store raw LLM output (ToolStudioPromptSerializer, CustomToolSerializer) declare Meta.html_safe_fields opt-outs, and PromptStudioOutputSerializer received the same treatment in response to the prior review thread.

Backend: AuditSerializer now extends utils.serializer.ModelSerializer (mixin + DRF); three redundant validate_description methods removed; notification_v2 and tags serializers also migrated to the sanitized base class.
Frontend: PromptCard now parses DRF field-level validation_error responses and surfaces them as inline errors on the offending input rather than showing a generic alert banner; DocumentParser re-throws field errors so they propagate to PromptCard's catch handler.

Confidence Score: 5/5

Safe to merge — the change narrows input acceptance on non-LLM fields and is fully backed by the prod-baseline scan referenced in the PR description.

The AuditSerializer wiring is a one-line parent-class swap; create/update logic is untouched. Opt-outs for LLM fields are present and tested. The three removed validate_description methods were strictly redundant (DRF already short-circuits None before running validators, so the None-guard was dead code). Frontend error handling is correctly scoped: field errors surface inline while non-field errors continue to use the global alert banner, and the finally block ensures update-status is always cleared.

No files require special attention.

Important Files Changed

Filename	Overview
backend/backend/serializers.py	AuditSerializer base class now inherits from utils.serializer.ModelSerializer; create/update logic unchanged.
backend/utils/serializer/sanitization.py	Validator now uses field.label for user-facing error messages; DRF always sets label after bind so the fallback is harmless.
backend/prompt_studio/prompt_studio_v2/serializers.py	Adds html_safe_fields opt-outs for prompt, assert_prompt, assertion_failure_prompt, and output.
backend/prompt_studio/prompt_studio_core_v2/serializers.py	Adds html_safe_fields for LLM-facing fields and removes redundant validate_description.
backend/prompt_studio/prompt_studio_output_manager_v2/serializers.py	Adds html_safe_fields = (output, context) addressing prior review thread.
frontend/src/components/custom-tools/prompt-card/PromptCard.jsx	Adds fieldErrors state; catch handler parses DRF validation_error and surfaces inline errors per field.
frontend/src/components/custom-tools/document-parser/DocumentParser.jsx	Re-throws field-keyed DRF validation errors; non-field errors continue to show the global alert banner.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph Backend["Backend — Serializer Hierarchy"]
        DRF["drf.ModelSerializer"]
        Mixin["SanitizedSerializerMixin"]
        UtilsMS["utils.serializer.ModelSerializer"]
        Audit["AuditSerializer"]
        Subs["~18 AuditSerializer subclasses"]
        ExemptA["ToolStudioPromptSerializer\nhtml_safe_fields: prompt, assert_prompt, output"]
        ExemptB["CustomToolSerializer\nhtml_safe_fields: summarize_prompt, preamble, postamble, output"]
        ExemptC["PromptStudioOutputSerializer\nhtml_safe_fields: output, context"]
        Direct["NotificationSerializer / TagSerializer"]
        DRF --> UtilsMS
        Mixin --> UtilsMS
        UtilsMS --> Audit
        Audit --> Subs
        Subs --> ExemptA
        Subs --> ExemptB
        Subs --> ExemptC
        UtilsMS --> Direct
    end
    subgraph Frontend["Frontend — Error Propagation"]
        PromptCard["PromptCard.handleChange"]
        DocParser["DocumentParser.handleChangePromptCard"]
        API["PATCH /api/prompt"]
        PromptCard -->|calls| DocParser
        DocParser -->|axios| API
        API -->|validation_error + attr| DocParser
        DocParser -->|re-throw| PromptCard
        PromptCard --> InlineErr["Inline error on EditableText"]
        API -->|other errors| DocParser
        DocParser --> Banner["Global alert banner"]
    end

_{Reviews (4): Last reviewed commit: "UN-3393 [FEAT] Inline field errors on pr..." | Re-trigger Greptile}

Builds on the foundation PR by routing every `AuditSerializer` subclass through `SanitizedSerializerMixin`. ~18 write-path serializers (`WorkflowSerializer`, `CustomToolSerializer`, `APIDeploymentSerializer`, `APIKeySerializer`, `ConnectorInstanceSerializer`, `BaseAdapterSerializer`, `PlatformKeySerializer`, `PlatformApiKey{Create,Update}Serializer`, `PipelineSerializer`, `ToolInstanceSerializer`, `ToolStudioPromptSerializer`, `PromptStudioOutputSerializer`, `ProfileManagerSerializer`, `IndexManagerSerializer`, `PromptStudioRegistry{,Info}Serializer`, `PromptStudioDocumentManagerSerializer`) now auto-reject HTML/JS-shaped input on every writable `CharField` / `TextField`. - `backend/backend/serializers.py`: `AuditSerializer` now inherits `utils.serializer.ModelSerializer` (the pre-mixed variant) instead of `rest_framework.serializers.ModelSerializer`. `create` / `update` semantics unchanged. - `prompt_studio/prompt_studio_v2.ToolStudioPromptSerializer`: declares `Meta.html_safe_fields = ("prompt", "assert_prompt", "assertion_failure_prompt", "output")`. LLM prompt text legitimately contains XML/HTML-like markup (e.g. `<context>`, `<thinking>`); LLM `output` may include anything the model produced. - `prompt_studio/prompt_studio_core_v2.CustomToolSerializer`: declares `Meta.html_safe_fields = ("summarize_prompt", "preamble", "postamble", "output")`. Tool-level LLM context fields and stored LLM output. - Removes redundant manual `validate_description(self, value)` methods in `api_v2.APIDeploymentSerializer`, `workflow_v2.WorkflowSerializer`, and `prompt_studio_core_v2.CustomToolSerializer`. The mixin now covers these via the `AuditSerializer` base. `validate_<name>` methods that use `validate_name_field` are retained because they also strip whitespace and reject empty values — behaviour the mixin doesn't duplicate. - Imports of `validate_no_html_tags` are dropped from the three files whose `validate_description` methods were removed. Files that inherit DRF base classes directly (without going through `AuditSerializer`) are tracked as a follow-up sweep. The AuditSerializer path already covers the highest-value write-path entities. Test coverage: `cd backend && uv run pytest utils/tests/ -q` → 85 passed. Full Django check requires cloud-deps; PR CI exercises the full suite. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Extends PR2 coverage to two user-write-path serializers that don't go through `AuditSerializer`: - `tags.TagSerializer` (user-create with `name` + `description`) now inherits `utils.serializer.ModelSerializer`. Sister `TagParamsSerializer` is a query-param parser with strict regex validation; left as-is. - `notification_v2.NotificationSerializer` now inherits `utils.serializer.ModelSerializer`. The model's `name`, `authorization_key`, `authorization_header`, and `url` (URLField is a CharField subclass) are auto-sanitized. `notification_type`, `authorization_type`, `platform` are ChoiceField in the serializer, not CharField; the mixin skips them. The existing manual `validate_name` keeps its uniqueness + strip-whitespace logic; the mixin's HTML check runs alongside as redundant defence. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Greptile P1 review comment on PR #1966. `PromptStudioOutputManager` stores raw LLM responses (`output`: CharField) and document chunks (`context`: TextField) that routinely contain <thinking>, <context>, and other XML-like tags. The serializer is mounted on a ModelViewSet with no class-level HTTP-method restriction, so write paths through DRF admin / browsable API / any future write endpoint would incorrectly reject legitimate LLM output without this opt-out. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

- Mixin now emits "Prompt key …" instead of "prompt_key …" by using `field.label` or a title-cased fallback when invoking the validator. - DocumentParser re-throws DRF field-keyed validation errors so the prompt card can render them inline instead of showing only a transient toast. Non-field errors continue to surface via the existing toast. - PromptCard tracks per-field error state, keeps the typed value so users can edit-and-fix in place, and clears the error on the next edit attempt for the same field. - EditableText accepts an `error` prop, applies Ant `status="error"` on the input, and renders the message in a danger Text node below. Co-Authored-By: Claude Opus 4.7 <[email protected]>

sonarqubecloud · 2026-05-19T12:06:21Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

greptile-apps Bot reviewed May 14, 2026

View reviewed changes

Comment thread backend/backend/serializers.py

chandrasekharan-zipstack force-pushed the feat/UN-3393-input-validation-sweep branch from 93b7ecd to 1e8e4ef Compare May 15, 2026 10:43

chandrasekharan-zipstack and others added 3 commits May 15, 2026 17:15

chandrasekharan-zipstack force-pushed the feat/UN-3393-input-validation-sweep branch from 1e8e4ef to 6ce8ea8 Compare May 15, 2026 11:46

chandrasekharan-zipstack self-assigned this May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UN-3393 [FEAT] Wire AuditSerializer subclasses through SanitizedSerializerMixin#1966

UN-3393 [FEAT] Wire AuditSerializer subclasses through SanitizedSerializerMixin#1966
chandrasekharan-zipstack wants to merge 4 commits into
feat/UN-3393-input-validation-foundationfrom
feat/UN-3393-input-validation-sweep

chandrasekharan-zipstack commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026 •

edited

Loading

Review skipped

Uh oh!

greptile-apps Bot commented May 14, 2026 •

edited

Loading

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chandrasekharan-zipstack commented May 14, 2026

What

Why

How

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

Database Migrations

Env Config

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

Screenshots

Checklist

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

greptile-apps Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 19, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 14, 2026 •

edited

Loading

greptile-apps Bot commented May 14, 2026 •

edited

Loading