Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
53694c6
code for harness
alexps9 Jun 5, 2026
9444583
Merge pull request #1 from alexps9/dev/v2.0
alexps9 Jun 5, 2026
2c8de9b
refactor: migrate to client-server runtime security framework (AgentG…
alexps9 Jun 9, 2026
1bdc84f
Merge pull request #2 from alexps9/dev/v2.0
alexps9 Jun 9, 2026
b8f0bdf
update client
lhahah Jun 9, 2026
adc2128
feat: add configurable checker runtime
lhahah Jun 10, 2026
bec1622
Add a dynamic registration mechanism for checker, unify API naming, a…
lhahah Jun 10, 2026
093dfe7
feat: require API key for backend frontend routes
lhahah Jun 10, 2026
2267562
llm-wrapper
kachitoritai-bilibili-user Jun 11, 2026
4a36952
adapter & backend api modification
lhahah Jun 12, 2026
6d2ffa8
Update readme & docs & client
lhahah Jun 14, 2026
980d2bf
js client
kachitoritai-bilibili-user Jun 14, 2026
785bff6
Use composite session identity across server sessions
lhahah Jun 15, 2026
269a011
Align JS client session identity with Python client
lhahah Jun 15, 2026
18695eb
Simplify client session registration and update docs
lhahah Jun 15, 2026
a01bad0
Add backend auditor registry and frontend APIs
lhahah Jun 15, 2026
bb2227d
frontend modification
lhahah Jun 15, 2026
01bd0df
marked builtin rules
lhahah Jun 15, 2026
6e8c29a
update auditor
lhahah Jun 15, 2026
52b561a
frontend checker page
kachitoritai-bilibili-user Jun 15, 2026
aeabf48
frontend checkers page
kachitoritai-bilibili-user Jun 16, 2026
feca73d
checkers modification
kachitoritai-bilibili-user Jun 16, 2026
3b072f2
checkers -> plugins
kachitoritai-bilibili-user Jun 16, 2026
5e74f31
plugin code residue handling
kachitoritai-bilibili-user Jun 17, 2026
7f3af98
plugin code residue complete clearing for both python & js
kachitoritai-bilibili-user Jun 17, 2026
c80d3d6
frontend modification
kachitoritai-bilibili-user Jun 17, 2026
bd435f2
Rename checkers to plugins and update docs
lhahah Jun 17, 2026
f57e386
Refactor runtime plugins and unify adapter patch hooks
lhahah Jun 18, 2026
b13ee83
plugins(adapter) modification
kachitoritai-bilibili-user Jun 18, 2026
197b73b
plugins(adapter) modification
kachitoritai-bilibili-user Jun 18, 2026
0ffe7a5
openclaw hook adapter
kachitoritai-bilibili-user Jun 18, 2026
7d4a95b
Update plugin docs and runtime event payloads
lhahah Jun 18, 2026
0b88d61
Add Guard compatibility and legacy rules support
lhahah Jun 19, 2026
b955871
client adapter modification
kachitoritai-bilibili-user Jun 19, 2026
3b0d362
Block LLM calls on pre-guard decisions
lhahah Jun 21, 2026
3c0ed2a
Refactor agent adapter bindings and docs
lhahah Jun 21, 2026
645565d
openclaw adapter modification
kachitoritai-bilibili-user Jun 21, 2026
7fc85c3
Add Openclaw integration docs
lhahah Jun 21, 2026
6bdec7e
openclaw adapter modification
kachitoritai-bilibili-user Jun 21, 2026
de9ff1a
python adapter modification
kachitoritai-bilibili-user Jun 21, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ AGENTGUARD_PORT=38080 # (optional) default: 38080
AGENTGUARD_MODE=enforce # (optional) enforce | monitor | dry_run
AGENTGUARD_RUNTIME_MODE=sync # (optional) sync | async
AGENTGUARD_LOG_LEVEL=info # (optional) debug | info | warning | error
AGENTGUARD_API_KEY= # (optional) blank = no X-Api-Key check
AGENTGUARD_API_KEY=sk-agentguard-backend-X9m42Vq7Tz8nL3pA6cR0yH5uJ1sWfKdE

# Policy rules directory / file.
# Use a relative path — works for both Docker (CWD=/opt/agentguard) and
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ __pycache__/
*$py.class
*.pyo
*.pyd
*.cidex
.codex/

# Distribution / packaging
dist/
Expand Down
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "third_party/AgentDoG"]
path = third_party/AgentDoG
url = [email protected]:AI45Lab/AgentDoG.git
52 changes: 16 additions & 36 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,53 +1,33 @@
# AgentGuard runtime image — multi-stage, single binary surface.

# ─── Stage 1: build the wheel & dependencies into a venv ───
FROM python:3.11 AS builder
# AgentGuard server/runtime image. The server image only carries server + shared
# source; client code is not required for backend imports.
FROM python:3.11-slim AS runtime

ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1
PIP_DISABLE_PIP_VERSION_CHECK=1 \
AGENTGUARD_HOST=0.0.0.0 \
AGENTGUARD_PORT=38080 \
PYTHONPATH="/opt/agentguard/src:/opt/agentguard/src/server:/opt/agentguard"

RUN apt-get update \
&& apt-get install -y --no-install-recommends build-essential libpq-dev \
&& apt-get install -y --no-install-recommends curl tini \
&& rm -rf /var/lib/apt/lists/*

WORKDIR /opt/agentguard

COPY pyproject.toml README.md README_CN.md ./
COPY agentguard ./agentguard

RUN python -m venv /opt/venv \
&& /opt/venv/bin/pip install --upgrade pip \
&& /opt/venv/bin/pip install ".[server,redis,postgres,dynamic]"


# ─── Stage 2: lean runtime ───
FROM python:3.11 AS runtime

ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PATH="/opt/venv/bin:$PATH" \
AGENTGUARD_HOST=0.0.0.0 \
AGENTGUARD_PORT=38080

RUN apt-get update \
&& apt-get install -y --no-install-recommends libpq5 curl tini \
&& rm -rf /var/lib/apt/lists/* \
&& groupadd --system agentguard \
&& useradd --system --gid agentguard --home /home/agentguard --create-home agentguard
# Dependencies first for better layer caching.
COPY pyproject.toml README.md ./
RUN pip install "pydantic>=2.5,<3.0" "fastapi>=0.110" "uvicorn>=0.27"

WORKDIR /opt/agentguard

COPY --from=builder /opt/venv /opt/venv
COPY agentguard ./agentguard
# Server source + shared source (PYTHONPATH layout, no editable install needed).
COPY src/server ./src/server
COPY src/shared ./src/shared
COPY rules ./rules
COPY frontend ./frontend
COPY plugins ./plugins
COPY scripts ./scripts

RUN chown -R agentguard:agentguard /opt/agentguard

USER agentguard
RUN chmod +x scripts/*.sh 2>/dev/null || true

EXPOSE 38080

Expand Down
107 changes: 74 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<img src="https://img.shields.io/badge/Document-Docs-0ea5e9?style=for-the-badge&logo=gitbook&logoColor=white" alt="Document" />
</a>
<a href="https://github.com/WhitzardAgent/AgentGuard/releases">
<img src="https://img.shields.io/badge/Release-v1.0-111827?style=for-the-badge&logo=github&logoColor=white" alt="Release v1.0" />
<img src="https://img.shields.io/badge/Release-v2.0-111827?style=for-the-badge&logo=github&logoColor=white" alt="Release v2.0" />
</a>
<a href="./LICENSE">
<img src="https://img.shields.io/badge/License-GPL%20v3-16a34a?style=for-the-badge&logo=open-source-initiative&logoColor=white" alt="License" />
Expand All @@ -18,26 +18,30 @@
</p>

<p align="center">
<strong>AgentGuard: An Attribute-Based Access Control Framework for Tool-Use LLM-Based Agent</strong>
<strong>AgentGuard: Zero-Trust Security Foundation for AI Agents</strong>
</p>

<p align="center">
Declarative policy enforcement, provenance-aware decisions, and human-in-the-loop safety for tool invocations.
Seamlessly integrates with existing agent frameworks and supports modular deployment of existing rule-based and model-based security strategies.
</p>

<table align="center" width="100%" cellspacing="0" cellpadding="0">
<tr>
<td align="center" width="30%" style="padding: 20px 18px; border: 1px solid #e5e7eb; border-radius: 18px; background: #ffffff;">
<td align="center" width="25%" style="padding: 20px 18px; border: 1px solid #e5e7eb; border-radius: 18px; background: #ffffff;">
<div style="font-size: 28px; line-height: 1; margin-bottom: 10px;">🧩</div>
<small><strong>Seamless&nbsp;Integration</strong></small>
</td>
<td align="center" width="30%" style="padding: 20px 18px; border: 1px solid #e5e7eb; border-radius: 18px; background: #ffffff;">
<td align="center" width="25%" style="padding: 20px 18px; border: 1px solid #e5e7eb; border-radius: 18px; background: #ffffff;">
<div style="font-size: 28px; line-height: 1; margin-bottom: 10px;">🧱</div>
<small><strong>Modular&nbsp;Security&nbsp;Strategies</strong></small>
</td>
<td align="center" width="25%" style="padding: 20px 18px; border: 1px solid #e5e7eb; border-radius: 18px; background: #ffffff;">
<div style="font-size: 28px; line-height: 1; margin-bottom: 10px;">🛡️</div>
<small><strong>Multi&#8209;Risk&nbsp;Coverage</strong></small>
</td>
<td align="center" width="40%" style="padding: 20px 18px; border: 1px solid #e5e7eb; border-radius: 18px; background: #ffffff;">
<td align="center" width="25%" style="padding: 20px 18px; border: 1px solid #e5e7eb; border-radius: 18px; background: #ffffff;">
<div style="font-size: 28px; line-height: 1; margin-bottom: 10px;">👁️</div>
<small><strong>Visual&nbsp;Rule&nbsp;Setup&nbsp;&amp;&nbsp;Audit</strong></small>
<small><strong>Visual&nbsp;Audit</strong></small>
</td>
</tr>
</table>
Expand All @@ -46,39 +50,29 @@
> [!IMPORTANT]
> This project is still under active development and may contain bugs. Contributions via Issues and PRs are welcome.

AgentGuard is an attribute-based access control framework for agent tool calls that sits between an LLM-based planning engine and the tools it invokes. Before each tool call is executed, and again after it completes, AgentGuard evaluates the agent's behavior against declarative policies to decide whether the action should proceed as-is, be blocked, or be routed for human check.
AgentGuard is a zero-trust security foundation for AI agents. Compatible with existing security strategies, it identifies and blocks security risks before each LLM call, after each LLM output, before each tool invocation, and after execution according to configurable safeguards, and it also supports post-hoc auditing of stored traces through pluggable custom auditors.

Today, AgentGuard covers several key technical areas highlighted in Anthropic's [Zero Trust for AI Agents](https://claude.com/blog/zero-trust-for-ai-agents), including access control & privilege management, observability & auditing, and behavioral monitoring & response.

![AgentGuard Positioning](./docs/figs/positioning.png)

AgentGuard can be integrated into existing agent frameworks without modifying the underlying execution logic. Currently, it supports LangChain, AutoGen, and OpenAI Agents SDK, and we are continuously expanding support for additional agent ecosystems and frameworks.
AgentGuard can be integrated into existing agent frameworks without modifying the underlying execution logic. Currently, it supports LangChain, AutoGen, OpenAI Agents SDK, and Openclaw, and we are continuously expanding support for additional agent ecosystems and frameworks. See the documentation chapter on `Openclaw` for the JavaScript-side integration details.

## ✨ Features

### 1. Rich Policy Expressiveness

AgentGuard policies are not hard-coded risk checks buried in business logic. They are written in a standalone DSL that describes when an action should be allowed, denied, or sent for human check. A policy can reference the principal's identity, tool metadata, tool arguments, target addresses, session history, and call-chain context, making it well-suited for the security boundaries commonly found in agent tool calls.

#### Arithmetic & Logical Expressions

Policy conditions support numeric comparisons, set membership checks, regex matching, substring matching, and arbitrary `AND` / `OR` / `NOT` combinations. For instance, `principal.trust_level < 2` distinguishes low-trust agents, `tool.recipient_domain NOT IN allowlist.email` restricts outbound destinations, and `tool.cmd MATCHES ...` identifies dangerous commands. These expressions can also be freely composed with `AND` / `OR` / `NOT`.

#### Cross-Tool Policies

AgentGuard can evaluate both individual tool calls and cross-step attack chains. Using `TRACE` and session-history functions, policies can express behaviors such as "read from a database, then send email," "read a sensitive file, then upload it to an external HTTP endpoint," or "external input eventually flows into a shell command", rather than relying solely on the current tool's arguments.
### 1. Multi-Dimensional Security Protection

#### Multi-Phase Intervention

Policies can apply at the pre-execution `requested` phase, the post-execution `completed` phase, or the failure `failed` phase. Pre-execution is suitable for blocking or requiring approval; post-execution can be used for logging results or triggering follow-up audits and rule evaluations based on `tool.result`.
According to configured safeguards, AgentGuard can intervene before each LLM call, after each LLM output, before each tool invocation, and after execution to identify and block security risks across the full agent runtime. In addition to inline intervention, it also supports post-hoc auditing over stored runtime traces through pluggable custom auditors.

#### Diverse Policy Decisions
#### Seamless Reuse of Existing Security Strategies

When a rule matches, it can return `ALLOW`, `DENY`, `HUMAN_CHECK`, or `LLM_CHECK`. Policies are therefore not limited to a binary allow/deny outcome: clearly dangerous operations can be rejected outright, while uncertain ones can be routed to a human or an LLM for review.
AgentGuard provides a unified interface for adapting existing security protections. Through its modular plugin architecture, rule-based and model-based strategies can be plugged in behind the same interface and enabled dynamically based on practical needs. Today, AgentGuard includes a built-in access-control strategy set, and users can build additional security policies through DSL definitions.

#### Subject & Object Labels
#### Single-Tool and Cross-Tool Protection

Policies can enforce differentiated controls based on agent (subject) and tool (object) attributes. Agents declare identity information such as `agent_id`, `session_id`, `role`, `trust_level`, and `scope`. Tools declare static labels such as `boundary`, `sensitivity`, `integrity`, and `tags`. This enables rules such as "low-trust agents cannot invoke privileged-boundary tools" or "results from high-sensitivity tools must not flow to external boundaries." Users can also define custom labels as needed.
AgentGuard can evaluate both individual tool calls and cross-step attack chains. By efficiently storing runtime context, it can detect behaviors such as "read from a database, then send email," "read a sensitive file, then upload it to an external HTTP endpoint," or "external input eventually flows into a shell command."

### 2. Seamless Integration with Agent Frameworks

Expand All @@ -88,6 +82,9 @@ Currently, we support the following agent frameworks:
- [LangChain](https://github.com/langchain-ai/langchain)
- [AutoGen](https://github.com/microsoft/autogen)
- [OpenAI Agents SDK](https://github.com/openai/openai-agents-python)
- Openclaw

The integration guides for these frameworks live under `docs/en/how-to-plugin/`, including the dedicated `Openclaw` chapter.

### 3. Visual Policy Configuration & Audit

Expand All @@ -101,7 +98,7 @@ AgentGuard uses a centralized control-plane architecture to govern distributed a

## 🚀 Quick Start

### 1. Write Access Control Policies and Start the Control Server
### 1. Write Plugin Config, Then Write Access Control Policies and Start the Control Server

> Docker must be installed first.

Expand All @@ -112,7 +109,43 @@ git clone https://github.com/WhitzardAgent/AgentGuard.git
cd AgentGuard
```

Create an access control policy:
First, create a plugin config file for the control server:

```bash
mkdir -p config

cat <<EOF > config/plugins.json
{
"phases": {
"llm_before": {
"client": [],
"server": []
},
"llm_after": {
"client": [],
"server": []
},
"tool_before": {
"client": [],
"server": [
{
"name": "rule_based_plugin",
"env": {}
}
]
},
"tool_after": {
"client": [],
"server": []
}
}
}
EOF
```

This config tells AgentGuard which plugins run in each runtime phase. In this quick start, only `tool_before` enables one server plugin: `rule_based_plugin`. That means the server evaluates access-control rules right before a tool call is executed, while all other phases stay empty. This keeps the first demo simple: the client forwards tool-invocation decisions to the server, and the server uses the built-in rule-based plugin to match your policy rules and return an allow/deny decision.

Then create an access control policy:

```bash
mkdir -p rules
Expand Down Expand Up @@ -145,16 +178,22 @@ cp .env.example .env
vi .env
```

Set the server plugin config path in `.env`:

```bash
AGENTGUARD_SERVER_PLUGIN_CONFIG=./config/plugins.json
```

Start the control server:

```bash
./scripts/start.sh -d
```

The control server listens on port `38080`.
The UI listens on port `8080`.
The UI listens on port `38008`.

Visit `http://localhost:8080` to see the UI.
Visit `http://localhost:38008` to see the UI.

### 2. Agent-Side Setup

Expand Down Expand Up @@ -312,7 +351,7 @@ https://github.com/user-attachments/assets/75a17e37-7f51-4c59-96fa-ea449eb79859

Current defenses for agent security mainly fall into two categories: **malicious-intent detection at the model layer** and **tool-call behavior interception**. The former strengthens the underlying LLM through fine-tuning or detects unsafe intent by analyzing the model's reasoning process; the latter enforces predefined security policies at tool invocation time based on call traces, arguments, and runtime context to identify, block, or escalate high-risk actions.

Given that model fine-tuning is often expensive to train and deploy, and that many models do not expose a complete reasoning trace, AgentGuard focuses on the tool-call behavior layer. This approach does not require changing the underlying model. Instead, it places security controls around what the agent actually does, which makes it easier to integrate into existing agent stacks and more practical for production deployment.
Given that model fine-tuning is often expensive to train and deploy, and that many models do not expose a complete reasoning trace, AgentGuard focuses on practical runtime controls around both LLM interaction and tool execution. This approach does not require changing the underlying model. Instead, it places security controls around what the agent exchanges with the model and actually does in the environment, which makes it easier to integrate into existing agent stacks and more practical for production deployment.

As illustrated below, existing tool-call-based defenses address parts of the problem, but they are often fragmented and optimized for narrow risk scenarios, such as dangerous command filtering, isolated prompt-injection mitigation, or limited auditing. In contrast, AgentGuard provides a unified framework that more systematically covers access control, runtime behavior monitoring, and execution auditing. This design is also more closely aligned with the enterprise agent-security goals emphasized in Anthropic's [Zero Trust for AI Agents](https://claude.com/blog/zero-trust-for-ai-agents), including least-privilege permissions, constrained tool use, observable execution, and auditable policy enforcement.

Expand All @@ -326,8 +365,10 @@ The high-level architecture of AgentGuard is shown below.
<img src="./docs/figs/overview.png" alt="AgentGuard architecture" width="50%" />
</p>

- **Client**: With minimal code modifications, the AgentGuard client integrates into agent frameworks. It monitors every tool call, forwards relevant contextual information to the server, and enforces the server's policy decisions.
- **Server**: The server receives information from clients, evaluates agent actions against policies, produces policy decisions, and sends them back to clients. It also monitors agent status for administrative auditing.
- **Client**: With minimal code modifications, the AgentGuard client integrates into agent frameworks and can intercept before and after LLM calls, as well as before and after tool invocations. It can perform lightweight local filtering on the client side and forward events to the server for deeper inspection by configured plugins.
- **Server**: The server receives information from clients, uses configured plugins to evaluate agent actions against policies, produces policy decisions, and sends them back to clients. It also monitors agent status for administrative auditing.
- **Plugin Extensibility**: Both client and server support pluggable plugins. To add custom plugins, see the [client plugin guide](https://whitzardagent.github.io/AgentGuard/plugins/custom_client_plugin.html) and the [server plugin guide](https://whitzardagent.github.io/AgentGuard/plugins/custom_server_plugin.html).
- **Custom Auditor Extensibility**: The backend also supports pluggable custom auditors for post-hoc trace review. Shared auditor abstractions live under `src/server/backend/audit/`, while concrete auditors live under `src/server/backend/audit/auditors/`. See the documentation chapter on [custom auditors](https://whitzardagent.github.io/AgentGuard/auditors.html).

## 👥 Contributors

Expand Down Expand Up @@ -373,7 +414,7 @@ Listed in no particular order. Thanks to everyone who helped shape AgentGuard.
- Support more mainstream frameworks
- Support agent systems in more programming languages
- Enable protection for multi-agent scenarios
- Add monitoring for LLM inputs and outputs
- Expand LLM input/output monitoring and plugin coverage
- Add more varied policy actions
- Provide automatic security policy recommendations

Expand Down
Loading