Skip to content

[Security] Vector store metadata uses pickle - trivial RCE if .pkl file is tampered #348

@GunaPalanivel

Description

@GunaPalanivel

Problem

vectorstore_utils.py serializes and deserializes FAISS metadata using pickle.dump() and pickle.load():

# rag/vectorstore/vectorstore_utils.py:57-78
def save_metadata(metadata, path):
    with open(path, "wb") as f:
        pickle.dump(metadata, f)

def load_metadata(path):
    with open(path, "rb") as f:
        return pickle.load(f)  # <-- arbitrary code execution

Python pickle executes arbitrary code during deserialization. If an attacker can replace the .pkl file on disk - via a supply chain attack on the data pipeline, a compromised scraping source, or a path traversal bug elsewhere - they get full RCE with the server process privileges.

The metadata being stored is just a list of dicts with string values (id, text, source, code_blocks). There is no reason to use pickle for this structure.

Impact

  • Arbitrary code execution on the server if any .pkl file is modified
  • The data pipeline pulls from external URLs (plugin docs, Discourse, etc.) and writes these files - any compromise upstream propagates
  • pickle has been on the do-not-use-with-untrusted-data list since Python 2.x

Proposed Fix

Replace pickle with json in vectorstore_utils.py:

import json

def save_metadata(metadata, path):
    with open(path, "w", encoding="utf-8") as f:
        json.dump(metadata, f)

def load_metadata(path):
    with open(path, "r", encoding="utf-8") as f:
        return json.load(f)

Then update the embedding pipeline (store_embeddings.py) to write .json instead of .pkl, and update retriever_utils.py to read from the new path. Existing .pkl files need a one-time migration or rebuild.

Acceptance Criteria

  • pickle.load and pickle.dump removed from vectorstore_utils.py
  • Metadata saved as JSON (or MessagePack if JSON perf is a concern)
  • store_embeddings.py writes .json metadata files
  • retriever_utils.py reads .json metadata files
  • Existing unit tests in tests/unit/rag/vectorstore/ updated and passing
  • Migration note or rebuild script documented for existing deployments

References

  • chatbot-core/rag/vectorstore/vectorstore_utils.py lines 57-78
  • chatbot-core/rag/vectorstore/store_embeddings.py lines 14-15
  • chatbot-core/rag/retriever/retriever_utils.py (reads the metadata)
  • Python docs on pickle security: https://docs.python.org/3/library/pickle.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Enhancement.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions