Skip to content

fix: use SafeLoader for YAML parsing and harden script/import paths#348

Open
georgepwall1991 wants to merge 2 commits into
darrenburns:mainfrom
georgepwall1991:fix/yaml-deserialization-security
Open

fix: use SafeLoader for YAML parsing and harden script/import paths#348
georgepwall1991 wants to merge 2 commits into
darrenburns:mainfrom
georgepwall1991:fix/yaml-deserialization-security

Conversation

@georgepwall1991

@georgepwall1991 georgepwall1991 commented Apr 7, 2026

Copy link
Copy Markdown

Description

While poking around the codebase I noticed that yaml.py imports CLoader/Loader rather than CSafeLoader/SafeLoader. Both CLoader and Loader will happily deserialize arbitrary Python objects via YAML tags like !!python/object/apply:os.system, which means a crafted .posting.yaml file in a shared collection can execute arbitrary code the moment Posting scans the directory — no user interaction needed beyond opening the collection.

Since collections are designed to be shared as YAML files in git repos, this felt worth fixing. The theme loader (themes.py) had the same issue using yaml.FullLoader.

While I was in there I also noticed a few related path-safety issues in the script executor, the import handlers, and the persistence layer, so I've included fixes for those too.

Changes

src/posting/yaml.py

Swapped CLoader/Loader for CSafeLoader/SafeLoader. SafeLoader only handles plain data types (strings, dicts, lists, numbers, booleans), which is all Posting's YAML files actually use. The custom str_presenter was already registered on SafeRepresenter, so dump behaviour is unchanged.

src/posting/themes.py

Replaced yaml.load(theme_file, Loader=yaml.FullLoader) with yaml.safe_load(theme_file) — theme files are just color/font mappings, no need for the unsafe loader.

src/posting/scripts.py

Two things here:

  1. The is_relative_to() path-boundary check was comparing a resolved script path against an unresolved collection_root. If the collection root itself is a symlink, resolve() produces a path that isn't relative to the unresolved root, which means the check could be bypassed. Now both sides are resolved before comparison.

  2. During script loading the script's parent directory gets temporarily inserted at sys.path[0]. A co-located file like json.py or httpx.py could shadow stdlib/third-party imports for the rest of the process while it's on the path. The module name is now prefixed with _posting_script_ to reduce collision risk, and the sys.path.remove() in the finally block is guarded against ValueError.

src/posting/importing/postman.py

Postman folder names (item.name) were used directly in path construction with no sanitization, so a folder named ../../etc could write files outside the target directory during import. Folder names are now stripped of everything except \w, spaces, hyphens, and dots — and names that collide after sanitization get a _2, _3 suffix so distinct folders don't silently merge. The sanitized name is used for both Collection.path and Collection.name (the latter is what save_to_disk() actually uses for directory creation).

src/posting/collection.py

This is the core of the second commit. Collection.save_to_disk() was building filesystem paths directly from request.name and child.name with no sanitization, which meant any importer that passed through unsanitized names (OpenAPI operation summaries contain slashes, for example) could write files outside the intended directory.

Rather than patching each importer individually, I centralised the defence here:

  • Added _sanitize_path_component() — strips a string down to a safe single-path component (removes separators, traversal sequences, problematic characters), returns "unnamed" when nothing is left.
  • Added _unique_safe_name() — wraps the sanitizer and appends _2, _3 suffixes to avoid collisions.
  • save_to_disk() now runs every request name and child name through these helpers before using them as path components. Request names and child names are tracked in separate used-name sets since they don't actually collide on disk (users.posting.yaml and users/ coexist fine).

src/posting/__main__.py

The default output directory for posting import was built from collection.name without sanitization, so a malicious collection name could escape the default collection directory. Now uses _unique_output_dir() which sanitizes the name and picks a non-existing directory to avoid silently merging with a previous import. The .env filename for Postman imports is also sanitized.

tests/test_security.py (new)

29 regression tests covering:

  • YAML tag rejection (!!python/object/apply, !!python/object, !!python/module) and safe data loading
  • Theme YAML safety
  • Script path traversal (parent traversal, absolute paths, symlink escape, symlinked collection root)
  • Postman folder sanitization (traversal, slashes, empty names, deduplication of colliding names, preservation of valid names)
  • Collection.save_to_disk sanitization (request name traversal, child name traversal, OpenAPI-style slash names, duplicate name deduplication)
  • _sanitize_path_component and _unique_safe_name unit tests
  • sys.path restoration after both successful and failing script execution

Testing

All existing tests continue to pass (194 existing + 29 new = 223 total, 4 skipped). The SafeLoader change doesn't affect any existing fixture files since they only contain plain YAML data structures.

Posting uses `yaml.CLoader` (falling back to `yaml.Loader`) for loading
`.posting.yaml` collection files, and `yaml.FullLoader` for theme files.
Both loaders can deserialize arbitrary Python objects via tags like
`!!python/object/apply:os.system`, which means opening a malicious
collection or theme file triggers arbitrary code execution — no user
interaction required beyond loading the directory.

Since collections are designed to be shared (YAML files in git repos),
this is a realistic attack surface: a poisoned `.posting.yaml` in a
cloned collection runs code the moment Posting scans it.

Changes:

- yaml.py: swap `CLoader`/`Loader` for `CSafeLoader`/`SafeLoader`.
  SafeLoader only deserializes plain data types (strings, dicts, lists,
  numbers, booleans) which is all Posting's YAML files use. The custom
  `str_presenter` was already registered on `SafeRepresenter` so dumping
  is unaffected.

- themes.py: replace `yaml.FullLoader` with `yaml.safe_load` for the
  same reason — theme files are just color/font mappings.

- scripts.py: resolve `collection_root` before the `is_relative_to`
  check so symlinked collection roots can't bypass the path boundary.
  Also prefix imported module names with `_posting_script_` to avoid
  shadowing stdlib/third-party modules while the script's directory is
  temporarily on `sys.path`, and guard the `sys.path.remove()` in the
  finally block.

- postman.py: sanitize Postman folder names during import to strip path
  separators and traversal sequences. Uses the sanitized name for both
  `Collection.path` and `Collection.name` (the latter is what
  `save_to_disk()` actually uses for directory creation). Deduplicates
  names that collide after sanitization so distinct folders don't merge.

- tests/test_security.py: 20 regression tests covering YAML tag
  rejection, script path traversal (including symlink escape), Postman
  folder sanitization with deduplication, and sys.path restoration.
The previous commit sanitized folder names at the Postman import layer,
but Collection.save_to_disk() itself still built filesystem paths
directly from request.name and child.name. This meant OpenAPI imports
(which derive names from operation summaries and URL paths containing
slashes) could write files outside the target directory, and any future
importer would inherit the same exposure.

Moved sanitization into save_to_disk() so it applies regardless of where
the collection data came from. Request names and child collection names
are now stripped of path separators and traversal sequences before being
used as path components. Names that collide after sanitization get a _2,
_3 suffix so distinct items don't silently merge.

Requests and child collections track used names separately since they
occupy different namespaces on disk (files vs directories) — a request
called "users" and a sub-collection called "users" coexist as
users.posting.yaml and users/ without conflict.

Also sanitized the default output directory and env filename in
__main__.py. Repeated imports with names that sanitize identically now
get unique directories rather than merging into the same one.

Added 9 new tests covering save_to_disk traversal blocking, OpenAPI-
style slash names, duplicate name deduplication, and the sanitize helper
functions directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant