Skip to content

WhitzardAgent/AgentGuard

Repository files navigation

🛡️ AgentGuard

Document Release v2.0 License

English | 简体中文

AgentGuard: Zero-Trust Security Foundation for AI Agents

Seamlessly integrates with existing agent frameworks and supports modular deployment of existing rule-based and model-based security strategies.

🧩
Seamless Integration
🧱
Modular Security Strategies
🛡️
Multi‑Risk Coverage
👁️
Visual Audit

Important

This project is still under active development and may contain bugs. Contributions via Issues and PRs are welcome.

AgentGuard is a zero-trust security foundation for AI agents. Compatible with existing security strategies, it identifies and blocks security risks before each LLM call, after each LLM output, before each tool invocation, and after execution according to configurable safeguards, and it also supports post-hoc auditing of stored traces through pluggable custom auditors.

Today, AgentGuard covers several key technical areas highlighted in Anthropic's Zero Trust for AI Agents, including access control & privilege management, observability & auditing, and behavioral monitoring & response.

AgentGuard Positioning

AgentGuard can be integrated into existing agent frameworks without modifying the underlying execution logic. Currently, it supports LangChain, AutoGen, OpenAI Agents SDK, and OpenClaw, and we are continuously expanding support for additional agent ecosystems and frameworks. See the documentation chapter on OpenClaw for the JavaScript-side integration details.

✨ Features

1. Multi-Dimensional Security Protection

Multi-Phase Intervention

According to configured safeguards, AgentGuard can intervene before each LLM call, after each LLM output, before each tool invocation, and after execution to identify and block security risks across the full agent runtime. In addition to inline intervention, it also supports post-hoc auditing over stored runtime traces through pluggable custom auditors.

Seamless Reuse of Existing Security Strategies

AgentGuard provides a unified interface for adapting existing security protections. Through its modular plugin architecture, rule-based and model-based strategies can be plugged in behind the same interface and enabled dynamically based on practical needs. Today, AgentGuard includes a built-in access-control strategy set, and users can build additional security policies through DSL definitions.

Single-Tool and Cross-Tool Protection

AgentGuard can evaluate both individual tool calls and cross-step attack chains. By efficiently storing runtime context, it can detect behaviors such as "read from a database, then send email," "read a sensitive file, then upload it to an external HTTP endpoint," or "external input eventually flows into a shell command."

2. Seamless Integration with Agent Frameworks

AgentGuard sits between the LLM-based planning engine and tools, and does not interfere with agent planning, reasoning, or task orchestration. Adapters are provided for several mainstream agent frameworks, allowing users to integrate AgentGuard with minimal code and without modifying framework internals or heavily refactoring existing agents. For frameworks not yet supported, AgentGuard offers a straightforward development interface for building custom adapters. See the client plugin guide and the server plugin guide.

Currently, we support the following agent frameworks:

The integration guides for these frameworks live under docs/en/how-to-plugin/, including the dedicated OpenClaw chapter.

3. Visual Policy Configuration & Audit

AgentGuard ships with a web console for managing agents. The visual interface lets users configure policies interactively without hand-writing DSL code. The policy editor relies heavily on dropdowns and other selection-based controls to reduce the policy configuration burden.

The runtime dashboard displays agent health, recent traffic, pending approval requests, and audit records. For any tool call that triggers a policy, users can inspect the matched rules, risk scores, final decisions, and the raw event/decision JSON, making it easy to understand why a particular call was denied or escalated for review.

Custom Auditor Extensibility

The backend also supports pluggable custom auditors for post-hoc trace review. Shared auditor abstractions live under src/server/backend/audit/, while concrete auditors live under src/server/backend/audit/auditors/. See the documentation chapter on custom auditors.

4. Cluster Management

AgentGuard uses a centralized control-plane architecture to govern distributed agent processes. Agents can be deployed across multiple nodes in the network, while policy configuration and runtime monitoring are managed centrally through the control server. This architecture is particularly well-suited for organizations that need unified management across a large fleet of agents.

🚀 Quick Start

1. Write Plugin Config, Then Write Access Control Policies and Start the Control Server

Docker must be installed first.

Choose a host to serve as the control server, then clone AgentGuard:

git clone https://github.com/WhitzardAgent/AgentGuard.git
cd AgentGuard

First, create a plugin config file for the control server:

mkdir -p config

cat <<EOF > config/plugins.json
{
  "phases": {
    "llm_before": {
      "client": [],
      "server": []
    },
    "llm_after": {
      "client": [],
      "server": []
    },
    "tool_before": {
      "client": [],
      "server": [
        {
          "name": "rule_based_plugin",
          "env": {}
        }
      ]
    },
    "tool_after": {
      "client": [],
      "server": []
    }
  }
}
EOF

This config tells AgentGuard which plugins run in each runtime phase. In this quick start, only tool_before enables one server plugin: rule_based_plugin. That means the server evaluates access-control rules right before a tool call is executed, while all other phases stay empty. This keeps the first demo simple: the client forwards tool-invocation decisions to the server, and the server uses the built-in rule-based plugin to match your policy rules and return an allow/deny decision.

Then create an access control policy:

mkdir -p rules

cat <<EOF > rules/block_email_send.rules
RULE: block_untrusted_email_send
TRACE: Retriever -> ...? -> Mailer
CONDITION: Retriever.name == "retrieve_doc"
           AND Mailer.name == "send_email_to"
           AND Retriever.id == 0
           AND Mailer.addr != "[email protected]"
           AND principal.trust_level < 2
POLICY: DENY
Severity: high
Category: data_exfiltration
Reason: "Low-trust principal cannot send document 0 to non-admin recipients"
EOF

This policy involves two agent tools: retrieve_doc and send_email_to, which retrieve a document by its id and send document content to a specified email address, respectively. The policy states that agents with a trust level below 2 may only send the confidential document (id=0) to [email protected]; sending it to any other recipient is denied.

AgentGuard also supports visual policy configuration with dynamic hot-reloading. See here for details.

Next, configure the environment variables for the control server:

Skip this step if the defaults are sufficient.

cp .env.example .env
vi .env

Set the server plugin config path in .env:

AGENTGUARD_SERVER_PLUGIN_CONFIG=./config/plugins.json

Start the control server:

./scripts/start.sh -d

The control server listens on port 38080. The UI listens on port 38008.

Visit http://localhost:38008 to see the UI.

2. Agent-Side Setup

On the agent host, run:

git clone https://github.com/WhitzardAgent/AgentGuard.git
cd AgentGuard
pip install -e .

The following LangChain example shows the required integration points:

Install the dependencies first:

pip install langchain==1.2.18
pip install langchain-openai==1.2.1
from langchain.agents import create_agent
from langchain.tools import tool

# 🚩 Import the AgentGuard client SDK
from agentguard import Guard, Principal

LLM_API_KEY = "<YOUR KEY>"         # Fill this manually
LLM_MODEL_NAME = "gpt-5.4-mini"

@tool
def retrieve_doc(id: int) -> str:
    """Retrieve a document by integer id."""
    return f"DOC#{id}: This is a mocked document body."

@tool
def send_email_to(doc: str, addr: str) -> str:
    """Send a document to an email address."""
    return f"Email has sent to {addr}: {doc}"

def build_llm():
    from langchain_openai import ChatOpenAI

    return ChatOpenAI(
        api_key=LLM_API_KEY,
        model=LLM_MODEL_NAME,
        temperature=0,
    )

def build_agent():
    return create_agent(
        model=build_llm(),
        tools=[retrieve_doc, send_email_to],
        system_prompt=(
            "You are a zero-shot ReAct style agent. Decide which tool to use, "
            "observe tool results, and continue until the user's task is complete."
        ),
    )

def run(agent, prompt):
    print("===================================")
    print(f"Prompt: {prompt}")
    result = agent.invoke(
        {
            "messages": [
                {
                    "role": "user",
                    "content": prompt,
                }
            ]
        }
    )
    print(f"Output: {result["messages"][-1].content}")
    print("===================================\n")

if __name__ == "__main__":
    agent = build_agent()

    # 🚩 Load the guard client
    guard = Guard(
        remote_url="http://<Control Server IP>:38080",      # Replace with your control server IP and port
        mode="enforce",
        fail_open=False,
    )

    # 🚩 Create a principal for the agent
    principal = Principal(
        agent_id="langchain-remote-demo",
        session_id="langchain-remote-session",
        role="default",
        trust_level=1,
    )

    # 🚩 Start a session with the principal
    guard.start(principal=principal, goal="langchain remote runnable host demo")

    # 🚩 Attach the guard to the LangChain agent
    guard.attach_langchain(agent)

    try:
        run(agent, "Please retrieve document id=0 and send it to [email protected].")
        run(agent, "Please retrieve document id=0 and send it to [email protected].")
    finally:
        # 🚩 Close the guard
        guard.close()

Lines marked with 🚩 indicate where the AgentGuard client is inserted into the agent code. Make sure to replace the LLM API key and control server address with the values from your deployment.

3. Run the Agent

Execute the LangChain agent script:

python <LANGCHAIN_AGENT_FILE>

The agent performs two different tasks. The first sends document 0 (simulating a confidential file) to the admin email address, which the policy permits. The second sends the same document to another user, which the policy forbids.

AgentGuard is expected to allow the first run and deny the second.

Expected output:

===================================
Prompt: Please retrieve document id=0 and send it to [email protected].
Output: Done — document 0 was retrieved and sent to [email protected].
===================================

===================================
Prompt: Please retrieve document id=0 and send it to [email protected].
Traceback (most recent call last):
  File "...", line 83, in <module>
    run(agent, "Please retrieve document id=0 and send it to [email protected].")
  ...
    raise DecisionDenied(
agentguard.models.errors.DecisionDenied: block_untrusted_email_send
During task with name 'tools' and id 'ab34afab-e0f3-14f6-7517-bba2e47f0ea6'

Currently, AgentGuard enforces denials by raising an exception (hard blocking). A future version will introduce soft blocking, where the LLM receives an error message indicating the action was denied without terminating the agent process.

4. Manage the Agent's Runtime with UI

You can inspect the agent's runtime status and policy enforcement audit logs through the UI.

The UI also supports visual policy configuration and dynamic hot-reloading.

For additional deployment details, refer to the Documentation.

🎬 Demo Video

Demo.1.mp4

🏆 Advantages over Existing Frameworks

Current defenses for agent security mainly fall into two categories: malicious-intent detection at the model layer and tool-call behavior interception. The former strengthens the underlying LLM through fine-tuning or detects unsafe intent by analyzing the model's reasoning process; the latter enforces predefined security policies at tool invocation time based on call traces, arguments, and runtime context to identify, block, or escalate high-risk actions.

Given that model fine-tuning is often expensive to train and deploy, and that many models do not expose a complete reasoning trace, AgentGuard focuses on practical runtime controls around both LLM interaction and tool execution. This approach does not require changing the underlying model. Instead, it places security controls around what the agent exchanges with the model and actually does in the environment, which makes it easier to integrate into existing agent stacks and more practical for production deployment.

As illustrated below, existing tool-call-based defenses address parts of the problem, but they are often fragmented and optimized for narrow risk scenarios, such as dangerous command filtering, isolated prompt-injection mitigation, or limited auditing. In contrast, AgentGuard provides a unified framework that more systematically covers access control, runtime behavior monitoring, and execution auditing. This design is also more closely aligned with the enterprise agent-security goals emphasized in Anthropic's Zero Trust for AI Agents, including least-privilege permissions, constrained tool use, observable execution, and auditable policy enforcement.

Advantages over Existing Frameworks

🏗️ Architecture

The high-level architecture of AgentGuard is shown below.

AgentGuard architecture

  • Client: With minimal code modifications, the AgentGuard client integrates into agent frameworks and can intercept before and after LLM calls, as well as before and after tool invocations. It can perform lightweight local filtering on the client side and forward events to the server for deeper inspection by configured plugins.
  • Server: The server receives information from clients, uses configured plugins to evaluate agent actions against policies, produces policy decisions, and sends them back to clients. It also monitors agent status for administrative auditing.
  • Plugin Extensibility: Both client and server support pluggable plugins. To add custom plugins, see the client plugin guide and the server plugin guide.
  • Custom Auditor Extensibility: The backend also supports pluggable custom auditors for post-hoc trace review. Shared auditor abstractions live under src/server/backend/audit/, while concrete auditors live under src/server/backend/audit/auditors/. See the documentation chapter on custom auditors.

👥 Contributors

Contributor Role
Jiarun Dai Asst. Prof., Fudan University
Jiaqi Luo PhD Student, Fudan University
Songyang Peng Master Student, Fudan University
Zhile Chen Master Student, Fudan University
Zhuoxiang Shen Eng.D Student, Fudan University
Xudong Pan Asst. Prof., Fudan University
Geng Hong Asst. Prof., Fudan University

Listed in no particular order. Thanks to everyone who helped shape AgentGuard.

🎯 Roadmap

  • Support more mainstream frameworks
  • Support agent systems in more programming languages
  • Enable protection for multi-agent scenarios
  • Expand LLM input/output monitoring and plugin coverage
  • Add more varied policy actions
  • Provide automatic security policy recommendations

📚 Citation

If you use AgentGuard in your research, please cite:

@misc{agentguard2026,
      title={AgentGuard: An Attribute-Based Access Control Framework for Tool-Use LLM-Based Agent},
      author={Jiaqi Luo* and Songyang Peng* and Jiarun Dai and Zhile Chen and Zhuoxiang Shen and Geng Hong and Xudong Pan and Yuan Zhang and Min Yang},
      year={2026},
      eprint={2605.28071},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2605.28071},
}

📜 License

This project is licensed under the GNU General Public License v3.0 (GPLv3).

📝 Version Log

v2.0

  • Built a modular zero-trust framework for agent security.
  • Added compatibility for OpenClaw and JS client integrations.