English | 简体中文
AgentGuard: Zero-Trust Security Foundation for AI Agents
Seamlessly integrates with existing agent frameworks and supports modular deployment of existing rule-based and model-based security strategies.
|
🧩
Seamless Integration
|
🧱
Modular Security Strategies
|
🛡️
Multi‑Risk Coverage
|
👁️
Visual Audit
|
Important
This project is still under active development and may contain bugs. Contributions via Issues and PRs are welcome.
AgentGuard is a zero-trust security foundation for AI agents. Compatible with existing security strategies, it identifies and blocks security risks before each LLM call, after each LLM output, before each tool invocation, and after execution according to configurable safeguards, and it also supports post-hoc auditing of stored traces through pluggable custom auditors.
Today, AgentGuard covers several key technical areas highlighted in Anthropic's Zero Trust for AI Agents, including access control & privilege management, observability & auditing, and behavioral monitoring & response.
AgentGuard can be integrated into existing agent frameworks without modifying the underlying execution logic. Currently, it supports LangChain, AutoGen, OpenAI Agents SDK, and OpenClaw, and we are continuously expanding support for additional agent ecosystems and frameworks. See the documentation chapter on OpenClaw for the JavaScript-side integration details.
According to configured safeguards, AgentGuard can intervene before each LLM call, after each LLM output, before each tool invocation, and after execution to identify and block security risks across the full agent runtime. In addition to inline intervention, it also supports post-hoc auditing over stored runtime traces through pluggable custom auditors.
AgentGuard provides a unified interface for adapting existing security protections. Through its modular plugin architecture, rule-based and model-based strategies can be plugged in behind the same interface and enabled dynamically based on practical needs. Today, AgentGuard includes a built-in access-control strategy set, and users can build additional security policies through DSL definitions.
AgentGuard can evaluate both individual tool calls and cross-step attack chains. By efficiently storing runtime context, it can detect behaviors such as "read from a database, then send email," "read a sensitive file, then upload it to an external HTTP endpoint," or "external input eventually flows into a shell command."
AgentGuard sits between the LLM-based planning engine and tools, and does not interfere with agent planning, reasoning, or task orchestration. Adapters are provided for several mainstream agent frameworks, allowing users to integrate AgentGuard with minimal code and without modifying framework internals or heavily refactoring existing agents. For frameworks not yet supported, AgentGuard offers a straightforward development interface for building custom adapters. See the client plugin guide and the server plugin guide.
Currently, we support the following agent frameworks:
The integration guides for these frameworks live under docs/en/how-to-plugin/, including the dedicated OpenClaw chapter.
AgentGuard ships with a web console for managing agents. The visual interface lets users configure policies interactively without hand-writing DSL code. The policy editor relies heavily on dropdowns and other selection-based controls to reduce the policy configuration burden.
The runtime dashboard displays agent health, recent traffic, pending approval requests, and audit records. For any tool call that triggers a policy, users can inspect the matched rules, risk scores, final decisions, and the raw event/decision JSON, making it easy to understand why a particular call was denied or escalated for review.
The backend also supports pluggable custom auditors for post-hoc trace review. Shared auditor abstractions live under src/server/backend/audit/, while concrete auditors live under src/server/backend/audit/auditors/. See the documentation chapter on custom auditors.
AgentGuard uses a centralized control-plane architecture to govern distributed agent processes. Agents can be deployed across multiple nodes in the network, while policy configuration and runtime monitoring are managed centrally through the control server. This architecture is particularly well-suited for organizations that need unified management across a large fleet of agents.
Docker must be installed first.
Choose a host to serve as the control server, then clone AgentGuard:
git clone https://github.com/WhitzardAgent/AgentGuard.git
cd AgentGuardFirst, create a plugin config file for the control server:
mkdir -p config
cat <<EOF > config/plugins.json
{
"phases": {
"llm_before": {
"client": [],
"server": []
},
"llm_after": {
"client": [],
"server": []
},
"tool_before": {
"client": [],
"server": [
{
"name": "rule_based_plugin",
"env": {}
}
]
},
"tool_after": {
"client": [],
"server": []
}
}
}
EOFThis config tells AgentGuard which plugins run in each runtime phase. In this quick start, only tool_before enables one server plugin: rule_based_plugin. That means the server evaluates access-control rules right before a tool call is executed, while all other phases stay empty. This keeps the first demo simple: the client forwards tool-invocation decisions to the server, and the server uses the built-in rule-based plugin to match your policy rules and return an allow/deny decision.
Then create an access control policy:
mkdir -p rules
cat <<EOF > rules/block_email_send.rules
RULE: block_untrusted_email_send
TRACE: Retriever -> ...? -> Mailer
CONDITION: Retriever.name == "retrieve_doc"
AND Mailer.name == "send_email_to"
AND Retriever.id == 0
AND Mailer.addr != "[email protected]"
AND principal.trust_level < 2
POLICY: DENY
Severity: high
Category: data_exfiltration
Reason: "Low-trust principal cannot send document 0 to non-admin recipients"
EOFThis policy involves two agent tools: retrieve_doc and send_email_to, which retrieve a document by its id and send document content to a specified email address, respectively. The policy states that agents with a trust level below 2 may only send the confidential document (id=0) to [email protected]; sending it to any other recipient is denied.
AgentGuard also supports visual policy configuration with dynamic hot-reloading. See here for details.
Next, configure the environment variables for the control server:
Skip this step if the defaults are sufficient.
cp .env.example .env
vi .envSet the server plugin config path in .env:
AGENTGUARD_SERVER_PLUGIN_CONFIG=./config/plugins.jsonStart the control server:
./scripts/start.sh -dThe control server listens on port 38080.
The UI listens on port 38008.
Visit http://localhost:38008 to see the UI.
On the agent host, run:
git clone https://github.com/WhitzardAgent/AgentGuard.git
cd AgentGuard
pip install -e .The following LangChain example shows the required integration points:
Install the dependencies first:
pip install langchain==1.2.18 pip install langchain-openai==1.2.1
from langchain.agents import create_agent
from langchain.tools import tool
# 🚩 Import the AgentGuard client SDK
from agentguard import Guard, Principal
LLM_API_KEY = "<YOUR KEY>" # Fill this manually
LLM_MODEL_NAME = "gpt-5.4-mini"
@tool
def retrieve_doc(id: int) -> str:
"""Retrieve a document by integer id."""
return f"DOC#{id}: This is a mocked document body."
@tool
def send_email_to(doc: str, addr: str) -> str:
"""Send a document to an email address."""
return f"Email has sent to {addr}: {doc}"
def build_llm():
from langchain_openai import ChatOpenAI
return ChatOpenAI(
api_key=LLM_API_KEY,
model=LLM_MODEL_NAME,
temperature=0,
)
def build_agent():
return create_agent(
model=build_llm(),
tools=[retrieve_doc, send_email_to],
system_prompt=(
"You are a zero-shot ReAct style agent. Decide which tool to use, "
"observe tool results, and continue until the user's task is complete."
),
)
def run(agent, prompt):
print("===================================")
print(f"Prompt: {prompt}")
result = agent.invoke(
{
"messages": [
{
"role": "user",
"content": prompt,
}
]
}
)
print(f"Output: {result["messages"][-1].content}")
print("===================================\n")
if __name__ == "__main__":
agent = build_agent()
# 🚩 Load the guard client
guard = Guard(
remote_url="http://<Control Server IP>:38080", # Replace with your control server IP and port
mode="enforce",
fail_open=False,
)
# 🚩 Create a principal for the agent
principal = Principal(
agent_id="langchain-remote-demo",
session_id="langchain-remote-session",
role="default",
trust_level=1,
)
# 🚩 Start a session with the principal
guard.start(principal=principal, goal="langchain remote runnable host demo")
# 🚩 Attach the guard to the LangChain agent
guard.attach_langchain(agent)
try:
run(agent, "Please retrieve document id=0 and send it to [email protected].")
run(agent, "Please retrieve document id=0 and send it to [email protected].")
finally:
# 🚩 Close the guard
guard.close()Lines marked with 🚩 indicate where the AgentGuard client is inserted into the agent code. Make sure to replace the LLM API key and control server address with the values from your deployment.
Execute the LangChain agent script:
python <LANGCHAIN_AGENT_FILE>The agent performs two different tasks. The first sends document 0 (simulating a confidential file) to the admin email address, which the policy permits. The second sends the same document to another user, which the policy forbids.
AgentGuard is expected to allow the first run and deny the second.
Expected output:
===================================
Prompt: Please retrieve document id=0 and send it to [email protected].
Output: Done — document 0 was retrieved and sent to [email protected].
===================================
===================================
Prompt: Please retrieve document id=0 and send it to [email protected].
Traceback (most recent call last):
File "...", line 83, in <module>
run(agent, "Please retrieve document id=0 and send it to [email protected].")
...
raise DecisionDenied(
agentguard.models.errors.DecisionDenied: block_untrusted_email_send
During task with name 'tools' and id 'ab34afab-e0f3-14f6-7517-bba2e47f0ea6'
Currently, AgentGuard enforces denials by raising an exception (hard blocking). A future version will introduce soft blocking, where the LLM receives an error message indicating the action was denied without terminating the agent process.
You can inspect the agent's runtime status and policy enforcement audit logs through the UI.
The UI also supports visual policy configuration and dynamic hot-reloading.
For additional deployment details, refer to the Documentation.
Demo.1.mp4
Current defenses for agent security mainly fall into two categories: malicious-intent detection at the model layer and tool-call behavior interception. The former strengthens the underlying LLM through fine-tuning or detects unsafe intent by analyzing the model's reasoning process; the latter enforces predefined security policies at tool invocation time based on call traces, arguments, and runtime context to identify, block, or escalate high-risk actions.
Given that model fine-tuning is often expensive to train and deploy, and that many models do not expose a complete reasoning trace, AgentGuard focuses on practical runtime controls around both LLM interaction and tool execution. This approach does not require changing the underlying model. Instead, it places security controls around what the agent exchanges with the model and actually does in the environment, which makes it easier to integrate into existing agent stacks and more practical for production deployment.
As illustrated below, existing tool-call-based defenses address parts of the problem, but they are often fragmented and optimized for narrow risk scenarios, such as dangerous command filtering, isolated prompt-injection mitigation, or limited auditing. In contrast, AgentGuard provides a unified framework that more systematically covers access control, runtime behavior monitoring, and execution auditing. This design is also more closely aligned with the enterprise agent-security goals emphasized in Anthropic's Zero Trust for AI Agents, including least-privilege permissions, constrained tool use, observable execution, and auditable policy enforcement.
The high-level architecture of AgentGuard is shown below.
- Client: With minimal code modifications, the AgentGuard client integrates into agent frameworks and can intercept before and after LLM calls, as well as before and after tool invocations. It can perform lightweight local filtering on the client side and forward events to the server for deeper inspection by configured plugins.
- Server: The server receives information from clients, uses configured plugins to evaluate agent actions against policies, produces policy decisions, and sends them back to clients. It also monitors agent status for administrative auditing.
- Plugin Extensibility: Both client and server support pluggable plugins. To add custom plugins, see the client plugin guide and the server plugin guide.
- Custom Auditor Extensibility: The backend also supports pluggable custom auditors for post-hoc trace review. Shared auditor abstractions live under
src/server/backend/audit/, while concrete auditors live undersrc/server/backend/audit/auditors/. See the documentation chapter on custom auditors.
| Contributor | Role |
|---|---|
| Jiarun Dai | Asst. Prof., Fudan University |
| Jiaqi Luo | PhD Student, Fudan University |
| Songyang Peng | Master Student, Fudan University |
| Zhile Chen | Master Student, Fudan University |
| Zhuoxiang Shen | Eng.D Student, Fudan University |
| Xudong Pan | Asst. Prof., Fudan University |
| Geng Hong | Asst. Prof., Fudan University |
Listed in no particular order. Thanks to everyone who helped shape AgentGuard.
- Support more mainstream frameworks
- Support agent systems in more programming languages
- Enable protection for multi-agent scenarios
- Expand LLM input/output monitoring and plugin coverage
- Add more varied policy actions
- Provide automatic security policy recommendations
If you use AgentGuard in your research, please cite:
@misc{agentguard2026,
title={AgentGuard: An Attribute-Based Access Control Framework for Tool-Use LLM-Based Agent},
author={Jiaqi Luo* and Songyang Peng* and Jiarun Dai and Zhile Chen and Zhuoxiang Shen and Geng Hong and Xudong Pan and Yuan Zhang and Min Yang},
year={2026},
eprint={2605.28071},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2605.28071},
}This project is licensed under the GNU General Public License v3.0 (GPLv3).
- Built a modular zero-trust framework for agent security.
- Added compatibility for OpenClaw and JS client integrations.


