parrotsec-ai

A custom build of parrotsec/security bundled with Claude Code and llama.cpp for security analysis backed by either the Claude API or self-hosted GGUF models.

Purpose

parrotsec-ai is a containerized workspace for authorized offensive- security engagements: Claude Code as a red-team pair operator, backed by either the hosted Anthropic API or a locally-served abliterated model, with the full Parrot Security toolkit on PATH. It is built for:

Penetration tests against systems you own or have written authorization to assess
CTF and capture-the-flag work
Security research in isolated lab environments
Drafting reports, payloads, and tooling for engagements with a documented Rules of Engagement (ROE)

It is not a general-purpose AI coding container. The repo ships workspace/CLAUDE.md, which is bind-mounted into the container at /workspace/CLAUDE.md and governs how Claude Code operates inside an engagement: scope and authorization checks, PTES-style workflow, audit-trail discipline in notes.md, and a preference for Parrot's distro-shipped tooling over ad-hoc pip/clone installs. Mount your own directory at /workspace to swap in a different operating guide for a specific engagement.

Disclaimer

This image is intended to be paired with abliterated or otherwise reduced-safety models (e.g. mradermacher/Huihui-Qwen3.6-27B-abliterated-GGUF) that will produce content — exploit code, shellcode, phishing pretexts, credential-attack harnesses — that mainstream models refuse. Read these warnings before running it. They are adapted from the warnings shipped with the recommended abliterated model.

Sensitive or controversial outputs. The recommended models have had most of their safety tuning removed. They will produce content that is illegal to deploy outside an authorized scope. Treat every output as raw material requiring human review before execution against a target.
Not for general audiences. This is a security professional's tool. Do not expose it to end users, customer-facing applications, minors, or unsupervised environments.
Legal and ethical responsibility is yours. You are solely responsible for ensuring every action taken from this container — recon, exploitation, data collection, payload deployment — is covered by written authorization (engagement ROE, lab ownership, CTF rules, bug-bounty scope) and complies with applicable law. Absence of an ROE is a stop signal, not an obstacle.
Research and experimental use only. Use in lab, CTF, or authorized engagement contexts. Do not embed this image in production pipelines or expose its endpoints to untrusted networks.
Monitor and review outputs. Do not auto-execute model-generated commands or payloads. Read what the model produced; if a generated tool or payload would have a destructive blast radius, confirm explicit written approval before running it.
No safety guarantees. Neither the abliterated model authors, the llama.cpp project, nor this image's maintainers warrant the safety, legality, or correctness of generated outputs. Use is at your own risk.

The workspace CLAUDE.md reinforces these constraints at the agent level — authorization, ROE, and deconfliction govern every action regardless of whether the underlying model would refuse on its own.

What's inside

parrotsec/security:latest base image (full Parrot Security toolset)
Node.js 20 + @anthropic-ai/claude-code (the claude CLI)
An entrypoint that wires Claude Code to a local llama.cpp endpoint and applies the Claude Code KV-cache fix described by Unsloth (see below)
Two build variants:
- Linux + NVIDIA — Dockerfile + docker-compose.yml. Compiles llama-server from source against CUDA 12.6 and runs it inside the container on port 8001.
- macOS / Apple Silicon — Dockerfile.macos + docker-compose.macos.yml. Skips the CUDA build; you run llama-server natively on the Mac (Metal-accelerated) and the container reaches it via host.docker.internal.

Mainstream instruction-tuned models refuse legitimate pentest requests (payload generation, reverse shells, password cracking) even inside an authorized engagement, which is why this image is built around abliterated GGUFs. The specific quant to pick depends on your platform — see the platform sections below.

For action steps, jump straight to your platform:

Linux + NVIDIA GPU → Linux + NVIDIA
macOS on Apple Silicon → macOS / Apple Silicon

The sections immediately below apply to both platforms.

Volumes

All persistence is done through bind mounts so state survives container rebuilds. The compose file wires these up for you; the directories are created on first run.

Host path	Container path	Purpose
`./workspace`	`/workspace`	Your working directory — projects, reports, loot.
`./models`	`/models`	GGUF models. Doubles as the HuggingFace download cache. (Linux only — the macOS variant runs `llama-server` on the host.)
`./claude`	`/root/.claude`	Claude Code auth tokens, history, and settings.

Swap ./workspace for any absolute path you prefer (e.g. an existing engagement directory): -v /srv/engagements/acme:/workspace.

Using Claude Code

ANTHROPIC_BASE_URL=http://localhost:8001 and ANTHROPIC_API_KEY=sk-no-key-required are pre-set inside both container variants, so:

claude --model llama

llama-server only serves a single model at a time, so the value passed to --model is just a label — anything works. Use whatever name you'll recognize in the session UI.

To talk to the real Anthropic API instead, just unset the base URL/key for that session:

unset ANTHROPIC_BASE_URL ANTHROPIC_API_KEY ANTHROPIC_AUTH_TOKEN
claude

The Claude Code KV-cache fix

By default Claude Code injects an attribution header on every request, which defeats the local KV cache and slows inference by ~90% — see Unsloth's writeup. The fix has two halves:

~/.claude/settings.json is patched with:

{
  "env": {
    "CLAUDE_CODE_ENABLE_TELEMETRY": "0",
    "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1",
    "CLAUDE_CODE_ATTRIBUTION_HEADER": "0"
  }
}

The CLAUDE_CODE_ATTRIBUTION_HEADER flag has to live in settings.json — exporting it from the shell does not take effect.

llama-server is started with --kv-unified (plus --cache-type-k q8_0, --cache-type-v q8_0, --flash-attn on, --fit on) so the KV cache is reusable across requests.

The entrypoint applies the settings.json patch on both platforms and merges into any existing settings.json, so your other Claude Code preferences are preserved. The llama-server flags are baked into the Linux entrypoint; on macOS they belong on the llama-server command you run on the host (see macOS / Apple Silicon).

Exposed ports

8001 — llama-server HTTP API. Reachable from the host at http://localhost:8001. Do not expose to untrusted networks; it has no auth.

Updating

# Linux + NVIDIA
docker compose build --pull
docker compose up -d

# macOS
docker compose -f docker-compose.macos.yml build --pull
docker compose -f docker-compose.macos.yml up -d

Your workspace/, claude/, and models/ directories are untouched. To bump the bundled llama.cpp build (Linux only — macOS uses the host's Homebrew build), set LLAMACPP_REF (a git tag, e.g. b8950):

docker compose build --build-arg LLAMACPP_REF=b8950

Linux + NVIDIA

Prerequisites

NVIDIA GPU passthrough is enabled by default in docker-compose.yml. The host must have the NVIDIA Container Toolkit installed and configured (sudo nvidia-ctk runtime configure --runtime=docker && sudo systemctl restart docker).

The image bundles CUDA 12.6 runtime libraries. The driver libs come from the host through the container toolkit, so the host driver only needs to be new enough for CUDA 12.x (≥ 560 series).

If you need to run on a host without a GPU, remove (or comment out) the runtime: nvidia line and the deploy.resources block in docker-compose.yml; llama-server will fall back to CPU inference (slow).

Build

docker compose build

The first build compiles llama.cpp from source against CUDA 12.6 — expect 5–15 minutes depending on host. Rebuilds are cached.

Recommended model

mradermacher/Huihui-Qwen3.6-27B-abliterated-GGUF ships GGUFs of huihui_ai's abliterated Qwen 3.6 27B in quants from Q2_K (10.8 GB) to Q8_0 (28.7 GB); Q4_K_M (16.6 GB) is a good default for 24 GB VRAM cards. Read the Disclaimer before using an abliterated model.

Run

Pick a model (a local GGUF file in ./models/ or a HuggingFace repo spec) and pass it as LLAMA_MODEL:

LLAMA_MODEL=mradermacher/Huihui-Qwen3.6-27B-abliterated-GGUF:Q4_K_M docker compose up -d
docker compose exec parrotsec-ai bash

Inside the container you land in /workspace with claude and llama-server on PATH and ANTHROPIC_BASE_URL already pointing at the local server. Verify GPU passthrough with:

docker compose exec parrotsec-ai nvidia-smi

LLAMA_MODEL accepts three forms:

# 1. A filename in /models (./models on the host)
LLAMA_MODEL=Huihui-Qwen3.6-27B-abliterated.Q4_K_M.gguf

# 2. An absolute path (when mounting a single GGUF directly)
LLAMA_MODEL=/models/some/sub/dir/model.gguf

# 3. A HuggingFace repo spec — pulled on first run, cached under /models
LLAMA_MODEL=mradermacher/Huihui-Qwen3.6-27B-abliterated-GGUF:Q4_K_M

If LLAMA_MODEL is unset the container starts but llama-server does not. You can launch it manually after attaching, or rely on claude against the hosted Anthropic API.

`llama-server` configuration

All knobs are overridable via shell env or .env:

Variable	Default	Purpose
`LLAMA_MODEL`	empty	Skip starting `llama-server` if unset.
`LLAMA_PORT`	`8001`	Port `llama-server` binds to (and the host port mapping).
`LLAMA_HOST`	`0.0.0.0`	Bind address inside the container.
`LLAMA_CTX_SIZE`	`131072`	Context window. Claude Code requires ≥ 64k.
`LLAMA_CACHE`	`/models`	Where `-hf` downloads land.
`LLAMA_EXTRA_ARGS`	empty	Extra flags appended to `llama-server` (e.g. sampling).

The entrypoint passes --reasoning off to llama-server so the recommended thinking-by-default Qwen3 abliterated models reply directly. Override with LLAMA_EXTRA_ARGS=--reasoning=on if you want chain-of-thought.

macOS / Apple Silicon

Docker Desktop on macOS cannot pass Metal/MPS into containers, so the bundled CUDA llama.cpp build doesn't help you on a Mac. The Mac flow runs llama-server natively on the host (with full Metal acceleration) and points the container at it via host.docker.internal.

Prerequisites

Install llama-server on the Mac. Either install via Homebrew (brew install llama.cpp) or download a precompiled macOS build from https://github.com/ggml-org/llama.cpp/releases.

Recommended model

The 27B used on Linux+CUDA is too large for typical M-series unified memory. Use the 9B Qwen3.5 abliterated build instead — mradermacher/Huihui-Qwen3.5-9B-abliterated-GGUF at Q4_K_M runs comfortably on common Apple Silicon configurations. Read the Disclaimer before using an abliterated model.

Start `llama-server` on the host

From the repo root:

LLAMA_CACHE=$(pwd)/models llama-server \
    -hf mradermacher/Huihui-Qwen3.5-9B-abliterated-GGUF:Q4_K_M \
    --port 8001 \
    --kv-unified \
    --cache-type-k q8_0 --cache-type-v q8_0 \
    --flash-attn on --fit on \
    --reasoning off \
    --ctx-size 131072

LLAMA_CACHE=$(pwd)/models redirects HuggingFace downloads into ./models/, matching the Linux flow. --reasoning off skips the model's thinking phase — the recommended abliterated builds are thinking-by-default and Claude Code stalls waiting on the chain-of-thought block. --kv-unified, --cache-type-{k,v} q8_0, --flash-attn on, and --fit on are the llama-server half of the Claude Code KV-cache fix; keep them.

Build & run the container

In another terminal:

docker compose -f docker-compose.macos.yml up -d --build
docker compose -f docker-compose.macos.yml exec parrotsec-ai bash
# inside: claude --model llama

The macOS image ships the same Parrot toolset, Claude Code, workspace CLAUDE.md, and entrypoint-side KV-cache fix — it just outsources the model server to the host so inference uses Apple Silicon's GPU instead of running CPU-only inside Docker Desktop's Linux VM.

To bring it down:

docker compose -f docker-compose.macos.yml down

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

parrotsec-ai

Purpose

Disclaimer

What's inside

Volumes

Using Claude Code

The Claude Code KV-cache fix

Exposed ports

Updating

Linux + NVIDIA

Prerequisites

Build

Recommended model

Run

`llama-server` configuration

macOS / Apple Silicon

Prerequisites

Recommended model

Start `llama-server` on the host

Build & run the container

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
claude		claude
models		models
workspace		workspace
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
Dockerfile.macos		Dockerfile.macos
README.md		README.md
docker-compose.macos.yml		docker-compose.macos.yml
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh

Folders and files

Latest commit

History

Repository files navigation

parrotsec-ai

Purpose

Disclaimer

What's inside

Volumes

Using Claude Code

The Claude Code KV-cache fix

Exposed ports

Updating

Linux + NVIDIA

Prerequisites

Build

Recommended model

Run

llama-server configuration

macOS / Apple Silicon

Prerequisites

Recommended model

Start llama-server on the host

Build & run the container

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`llama-server` configuration

Start `llama-server` on the host

Packages