Skip to content

[Performance] is_valid_plugin() re-reads plugin_names.json from disk on every call - no caching #352

@sharma-sugurthi

Description

@sharma-sugurthi

Summary

is_valid_plugin() in chatbot-core/api/tools/utils.py opens, reads, and JSON-parses plugin_names.json from disk on every invocation.since this function is called during every search_plugin_docs() tool execution,it adds unnecessary disk I/O and JSON parsing latency to every plugin-related query.

Present Behavior

def is_valid_plugin(plugin_name: str) -> bool:
    def tokenize(item: str) -> str:
        item = item.replace('-', '')
        return item.replace(' ', '').lower()
    list_plugin_names_path = os.path.join(os.path.abspath(__file__),
                                          "..", "..", "data", "raw", "plugin_names.json")
    with open(list_plugin_names_path, "r", encoding="utf-8") as f:
        list_plugin_names = json.load(f)           # ← disk I/O + JSON parse every call

    for name in list_plugin_names:
        if tokenize(plugin_name) == tokenize(name):  # ← linear scan every call
            return True
    return False

Why this Is a Problem

plugin_names.json is static data-it never changes at runtime, so re-reading it from disk is wasted I/O.
each call performs a full linear scan over the entire plugin list with tokenization applied to every element.
in the agentic architecture (_get_reply_simple_query_pipeline), tool calls can be invoked multiple times per query due to the relevance reformulation loop (up to max_reformulate_iterations), amplifying the overhead.

Proposed Fix

import functools
@functools.lru_cache(maxsize=1)
def _load_plugin_names() -> frozenset:
    """Load and cache the set of known plugin names (tokenized)."""
    def tokenize(item: str) -> str:
        return item.replace('-', '').replace(' ', '').lower()
    list_plugin_names_path = os.path.join(
        os.path.abspath(__file__), "..", "..", "data", "raw", "plugin_names.json"
    )
    with open(list_plugin_names_path, "r", encoding="utf-8") as f:
        list_plugin_names = json.load(f)
    return frozenset(tokenize(name) for name in list_plugin_names)
def is_valid_plugin(plugin_name: str) -> bool:
    def tokenize(item: str) -> str:
        return item.replace('-', '').replace(' ', '').lower()
    return tokenize(plugin_name) in _load_plugin_names()

Benefits:

file is read once, not on every call
frozenset gives O(1) lookup instead of O(n) linear scan
lru_cache is thread-safe for reads (safe with the existing threading model)....

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions