Skip to content

UN-3393 [FEAT] Wire AuditSerializer subclasses through SanitizedSerializerMixin#1966

Open
chandrasekharan-zipstack wants to merge 4 commits into
feat/UN-3393-input-validation-foundationfrom
feat/UN-3393-input-validation-sweep
Open

UN-3393 [FEAT] Wire AuditSerializer subclasses through SanitizedSerializerMixin#1966
chandrasekharan-zipstack wants to merge 4 commits into
feat/UN-3393-input-validation-foundationfrom
feat/UN-3393-input-validation-sweep

Conversation

@chandrasekharan-zipstack
Copy link
Copy Markdown
Contributor

What

  • Builds on UN-3393 [FEAT] Add SanitizedSerializerMixin foundation for input validation #1965 (foundation PR) by routing every AuditSerializer subclass through SanitizedSerializerMixin. ~18 write-path serializers now auto-reject HTML/JS-shaped input on every writable CharField / TextField.
  • Affected (transitively): WorkflowSerializer, CustomToolSerializer, APIDeploymentSerializer, APIKeySerializer, ConnectorInstanceSerializer, BaseAdapterSerializer, PlatformKeySerializer, PlatformApiKey{Create,Update}Serializer, PipelineSerializer, ToolInstanceSerializer, ToolStudioPromptSerializer, PromptStudioOutputSerializer, ProfileManagerSerializer, IndexManagerSerializer, PromptStudioRegistry{,Info}Serializer, PromptStudioDocumentManagerSerializer.
  • Adds Meta.html_safe_fields opt-outs on the two serializers that legitimately carry LLM markup:
    • ToolStudioPromptSerializerprompt, assert_prompt, assertion_failure_prompt, output.
    • CustomToolSerializersummarize_prompt, preamble, postamble, output.
  • Removes the three redundant validate_description methods (in api_v2.APIDeploymentSerializer, workflow_v2.WorkflowSerializer, prompt_studio_core_v2.CustomToolSerializer) — the mixin now covers them.

Why

  • The foundation PR (UN-3393 [FEAT] Add SanitizedSerializerMixin foundation for input validation #1965) added SanitizedSerializerMixin and pre-mixed ModelSerializer / Serializer / HyperlinkedModelSerializer under utils.serializer but did not wire any production serializer through them. This PR is the smallest, highest-leverage wiring step: changing one parent class (AuditSerializer) closes the input-validation gap on the majority of write-path entities.
  • validate_<name> methods (which use validate_name_field) are kept because they also strip whitespace and reject empty values — behaviour the mixin doesn't duplicate. Only the validate_description methods (which just call validate_no_html_tags) are removed.
  • Free-form LLM prompt / output fields (prompt, output, summarize_prompt, etc.) legitimately contain <context> / <thinking> / arbitrary model output. They're explicitly opted out via Meta.html_safe_fields.

How

  • backend/backend/serializers.py: AuditSerializer now inherits utils.serializer.ModelSerializer instead of rest_framework.serializers.ModelSerializer. create / update semantics unchanged.
  • Per-serializer Meta.html_safe_fields declarations on the two LLM-bearing serializers (Prompt Studio).
  • Three validate_description removals + validate_no_html_tags import drop from the affected modules.

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

  • Functional: The 8 serializers that were already hand-wired (in the in-flight fix) continue to behave identically. The 14 manual validate_<field> methods still exist — the mixin's validator is appended to the field's validators list, so the same rejection is enforced either way. Removing only the redundant validate_description methods avoids double-validation noise but doesn't change any acceptance outcome.
  • New rejections: AuditSerializer subclasses that were not previously sanitised now auto-reject HTML in their CharField / TextField (e.g. ConnectorInstance.connector_name, PlatformKey.key_name, Pipeline.pipeline_name, AdapterInstance.adapter_name, etc.). Per the May 2026 US-prod scan (see KB Obsidian Vault/zipstuff/UN-3393-input-validation/05-prod-baseline.md), zero legitimate user content uses <…> in these columns — every match was pentest leftover. Migration risk: ~zero.
  • LLM fields explicitly opted out: prompt, assert_prompt, assertion_failure_prompt, output, summarize_prompt, preamble, postamble are listed in Meta.html_safe_fields so they continue to accept arbitrary text.
  • Direct DRF-base-class users: Serializers that inherit directly from serializers.ModelSerializer / serializers.Serializer / serializers.HyperlinkedModelSerializer (i.e. not via AuditSerializer) are not touched by this PR. They remain unchanged. A follow-up sweep will convert them; their absence here means write paths through them retain pre-PR behaviour.

Database Migrations

  • None.

Env Config

  • None.

Relevant Docs

  • KB: Obsidian Vault/zipstuff/UN-3393-input-validation/.
  • ADR 0003 (adr/0003-base-mixin-default-on-opt-out.md) — opt-out semantics.

Related Issues or PRs

  • Jira: UN-3393.
  • Stacked on: UN-3393 [FEAT] Add SanitizedSerializerMixin foundation for input validation #1965 (foundation).
  • Follow-up: a separate PR that converts the remaining ~20 direct DRF-base-class users (tags/serializers.py, scheduler/serializer.py, tenant_account_v2/serializer.py, pipeline_v2/serializers/{internal,update,sharing,execute}.py, response-only and list serializers across the codebase, …).
  • Follow-up: cover backend/pluggable_apps/ (lives in the unstract-cloud repo; separate PR there).

Dependencies Versions

  • None.

Notes on Testing

Local tests in the OSS worktree (cloud-deps not installed, so manage.py check is unavailable here — CI runs the full suite):

cd backend && uv run pytest utils/tests/ -q

→ 85 passed (50 from foundation + 35 pre-existing utils tests).

Manual sanity:

cd backend && uv run python -c "from utils.serializer import ModelSerializer; from backend.serializers import AuditSerializer; assert issubclass(AuditSerializer, ModelSerializer); print('ok')"

Screenshots

n/a (no UI change)

Checklist

I have read and understood the Contribution Guidelines.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0768e501-3352-43d4-b1a2-5e2f40325838

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/UN-3393-input-validation-sweep

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 14, 2026

Greptile Summary

This PR wires AuditSerializer (the shared base for ~18 write-path serializers) through SanitizedSerializerMixin, so HTML/JS-shaped input is auto-rejected on every writable CharField/TextField without per-field wiring. Two serializers that store raw LLM output (ToolStudioPromptSerializer, CustomToolSerializer) declare Meta.html_safe_fields opt-outs, and PromptStudioOutputSerializer received the same treatment in response to the prior review thread.

  • Backend: AuditSerializer now extends utils.serializer.ModelSerializer (mixin + DRF); three redundant validate_description methods removed; notification_v2 and tags serializers also migrated to the sanitized base class.
  • Frontend: PromptCard now parses DRF field-level validation_error responses and surfaces them as inline errors on the offending input rather than showing a generic alert banner; DocumentParser re-throws field errors so they propagate to PromptCard's catch handler.

Confidence Score: 5/5

Safe to merge — the change narrows input acceptance on non-LLM fields and is fully backed by the prod-baseline scan referenced in the PR description.

The AuditSerializer wiring is a one-line parent-class swap; create/update logic is untouched. Opt-outs for LLM fields are present and tested. The three removed validate_description methods were strictly redundant (DRF already short-circuits None before running validators, so the None-guard was dead code). Frontend error handling is correctly scoped: field errors surface inline while non-field errors continue to use the global alert banner, and the finally block ensures update-status is always cleared.

No files require special attention.

Important Files Changed

Filename Overview
backend/backend/serializers.py AuditSerializer base class now inherits from utils.serializer.ModelSerializer; create/update logic unchanged.
backend/utils/serializer/sanitization.py Validator now uses field.label for user-facing error messages; DRF always sets label after bind so the fallback is harmless.
backend/prompt_studio/prompt_studio_v2/serializers.py Adds html_safe_fields opt-outs for prompt, assert_prompt, assertion_failure_prompt, and output.
backend/prompt_studio/prompt_studio_core_v2/serializers.py Adds html_safe_fields for LLM-facing fields and removes redundant validate_description.
backend/prompt_studio/prompt_studio_output_manager_v2/serializers.py Adds html_safe_fields = (output, context) addressing prior review thread.
frontend/src/components/custom-tools/prompt-card/PromptCard.jsx Adds fieldErrors state; catch handler parses DRF validation_error and surfaces inline errors per field.
frontend/src/components/custom-tools/document-parser/DocumentParser.jsx Re-throws field-keyed DRF validation errors; non-field errors continue to show the global alert banner.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph Backend["Backend — Serializer Hierarchy"]
        DRF["drf.ModelSerializer"]
        Mixin["SanitizedSerializerMixin"]
        UtilsMS["utils.serializer.ModelSerializer"]
        Audit["AuditSerializer"]
        Subs["~18 AuditSerializer subclasses"]
        ExemptA["ToolStudioPromptSerializer\nhtml_safe_fields: prompt, assert_prompt, output"]
        ExemptB["CustomToolSerializer\nhtml_safe_fields: summarize_prompt, preamble, postamble, output"]
        ExemptC["PromptStudioOutputSerializer\nhtml_safe_fields: output, context"]
        Direct["NotificationSerializer / TagSerializer"]
        DRF --> UtilsMS
        Mixin --> UtilsMS
        UtilsMS --> Audit
        Audit --> Subs
        Subs --> ExemptA
        Subs --> ExemptB
        Subs --> ExemptC
        UtilsMS --> Direct
    end
    subgraph Frontend["Frontend — Error Propagation"]
        PromptCard["PromptCard.handleChange"]
        DocParser["DocumentParser.handleChangePromptCard"]
        API["PATCH /api/prompt"]
        PromptCard -->|calls| DocParser
        DocParser -->|axios| API
        API -->|validation_error + attr| DocParser
        DocParser -->|re-throw| PromptCard
        PromptCard --> InlineErr["Inline error on EditableText"]
        API -->|other errors| DocParser
        DocParser --> Banner["Global alert banner"]
    end
Loading

Reviews (4): Last reviewed commit: "UN-3393 [FEAT] Inline field errors on pr..." | Re-trigger Greptile

Comment thread backend/backend/serializers.py
@chandrasekharan-zipstack chandrasekharan-zipstack force-pushed the feat/UN-3393-input-validation-sweep branch from 93b7ecd to 1e8e4ef Compare May 15, 2026 10:43
chandrasekharan-zipstack and others added 3 commits May 15, 2026 17:15
Builds on the foundation PR by routing every `AuditSerializer` subclass
through `SanitizedSerializerMixin`. ~18 write-path serializers
(`WorkflowSerializer`, `CustomToolSerializer`, `APIDeploymentSerializer`,
`APIKeySerializer`, `ConnectorInstanceSerializer`, `BaseAdapterSerializer`,
`PlatformKeySerializer`, `PlatformApiKey{Create,Update}Serializer`,
`PipelineSerializer`, `ToolInstanceSerializer`, `ToolStudioPromptSerializer`,
`PromptStudioOutputSerializer`, `ProfileManagerSerializer`,
`IndexManagerSerializer`, `PromptStudioRegistry{,Info}Serializer`,
`PromptStudioDocumentManagerSerializer`) now auto-reject HTML/JS-shaped
input on every writable `CharField` / `TextField`.

- `backend/backend/serializers.py`: `AuditSerializer` now inherits
  `utils.serializer.ModelSerializer` (the pre-mixed variant) instead of
  `rest_framework.serializers.ModelSerializer`. `create` / `update`
  semantics unchanged.

- `prompt_studio/prompt_studio_v2.ToolStudioPromptSerializer`: declares
  `Meta.html_safe_fields = ("prompt", "assert_prompt",
  "assertion_failure_prompt", "output")`. LLM prompt text legitimately
  contains XML/HTML-like markup (e.g. `<context>`, `<thinking>`); LLM
  `output` may include anything the model produced.

- `prompt_studio/prompt_studio_core_v2.CustomToolSerializer`: declares
  `Meta.html_safe_fields = ("summarize_prompt", "preamble", "postamble",
  "output")`. Tool-level LLM context fields and stored LLM output.

- Removes redundant manual `validate_description(self, value)` methods
  in `api_v2.APIDeploymentSerializer`,
  `workflow_v2.WorkflowSerializer`, and
  `prompt_studio_core_v2.CustomToolSerializer`. The mixin now covers
  these via the `AuditSerializer` base. `validate_<name>` methods that
  use `validate_name_field` are retained because they also strip
  whitespace and reject empty values — behaviour the mixin doesn't
  duplicate.

- Imports of `validate_no_html_tags` are dropped from the three files
  whose `validate_description` methods were removed.

Files that inherit DRF base classes directly (without going through
`AuditSerializer`) are tracked as a follow-up sweep. The AuditSerializer
path already covers the highest-value write-path entities.

Test coverage: `cd backend && uv run pytest utils/tests/ -q` → 85 passed.
Full Django check requires cloud-deps; PR CI exercises the full suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Extends PR2 coverage to two user-write-path serializers that don't go
through `AuditSerializer`:

- `tags.TagSerializer` (user-create with `name` + `description`) now
  inherits `utils.serializer.ModelSerializer`. Sister
  `TagParamsSerializer` is a query-param parser with strict regex
  validation; left as-is.

- `notification_v2.NotificationSerializer` now inherits
  `utils.serializer.ModelSerializer`. The model's `name`,
  `authorization_key`, `authorization_header`, and `url` (URLField is a
  CharField subclass) are auto-sanitized. `notification_type`,
  `authorization_type`, `platform` are ChoiceField in the serializer,
  not CharField; the mixin skips them. The existing manual
  `validate_name` keeps its uniqueness + strip-whitespace logic; the
  mixin's HTML check runs alongside as redundant defence.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Greptile P1 review comment on PR #1966. `PromptStudioOutputManager`
stores raw LLM responses (`output`: CharField) and document chunks
(`context`: TextField) that routinely contain <thinking>, <context>,
and other XML-like tags. The serializer is mounted on a ModelViewSet
with no class-level HTTP-method restriction, so write paths through
DRF admin / browsable API / any future write endpoint would
incorrectly reject legitimate LLM output without this opt-out.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@chandrasekharan-zipstack chandrasekharan-zipstack force-pushed the feat/UN-3393-input-validation-sweep branch from 1e8e4ef to 6ce8ea8 Compare May 15, 2026 11:46
- Mixin now emits "Prompt key …" instead of "prompt_key …" by using
  `field.label` or a title-cased fallback when invoking the validator.
- DocumentParser re-throws DRF field-keyed validation errors so the
  prompt card can render them inline instead of showing only a transient
  toast. Non-field errors continue to surface via the existing toast.
- PromptCard tracks per-field error state, keeps the typed value so
  users can edit-and-fix in place, and clears the error on the next
  edit attempt for the same field.
- EditableText accepts an `error` prop, applies Ant `status="error"`
  on the input, and renders the message in a danger Text node below.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant