Skip to content

KiniunCorp/codi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

🧠 CODI (Container Dietician)

CI Python License Status

CODI is a rules-first, AI-assisted container optimisation toolkit that analyses, rewrites, benchmarks, and reports deterministic improvements across Node, Python, and Java stacks. The MVP ships:

  • End-to-end CLI/API pipelines with embedded offline LLM + RAG memory (codi:complete)
  • Air-gapped security enforcement, environment-aware config snapshots, and CPU/security validation suites
  • Curated example runs, a codi dashboard aggregator, and a static browser dashboard (docs/dashboard/) for showcasing optimisation impact
  • Schema-validated CMD/ENTRYPOINT rewrite catalog (cmd_rewrites) with a RulesCatalog helper and regression coverage for deterministic rule selection
  • Renderer-aware CMD rewrites with template promotions ensuring shell-form commands convert to exec-form with rationale comments
  • Full report/API/CLI surfacing of CMD analysis, dedicated report sections, and per-run metadata for CMD rewrites

All flows honour offline defaults, policy guardrails, and reproducible artefact layouts.

🎯 Project Overview

The MVP roadmap delivers two runtimes:

  • codi:slim β€” Rules-based CLI and API without external model dependencies
  • codi:complete β€” Slim runtime bundled with an offline LLM and lightweight RAG memory

πŸ“ Repository Structure

codi/
β”œβ”€β”€ core/            # parse, detect, render, build, report, store, security
β”œβ”€β”€ cli/             # Typer/Rich-based CLI interface
β”œβ”€β”€ api/             # FastAPI REST service
β”œβ”€β”€ patterns/        # rules.yml templates for supported stacks
β”œβ”€β”€ models/          # Local LLM documentation and configs
β”œβ”€β”€ docker/          # Dockerfiles for Slim and Complete containers
β”œβ”€β”€ data/            # LLM training data pipeline
β”‚   β”œβ”€β”€ raw/         # Collected Dockerfiles + metadata
β”‚   β”œβ”€β”€ curated/     # Standardized, deduplicated data
β”‚   β”œβ”€β”€ pairs/       # Training pairs (JSONL)
β”‚   └── splits/      # Train/val/test splits
β”œβ”€β”€ training/        # QLoRA training config, notebooks, and adapter packaging
β”œβ”€β”€ eval/            # LLM evaluation harness and metrics
β”œβ”€β”€ tune_module/     # Analyzer and Docker best-practice docs
β”œβ”€β”€ demo/            # Sample applications for testing (Node.js, Python, Java)
β”œβ”€β”€ tests/           # Test suite (unit + CLI + reporter)
β”œβ”€β”€ docs/            # PRD, task plan, estimates, quickstart, runbook
β”œβ”€β”€ Makefile         # Build automation and shortcuts
β”œβ”€β”€ pyproject.toml   # Python project configuration
└── requirements.txt # Core dependencies

βœ… Shipped Capabilities

Core rules pipeline

  • Typer/Rich CLI with analyze, rewrite, run, report, all, perf, dashboard, and serve commands
  • Tolerant Dockerfile parser + stack detector + policy validation
  • Stack-specific renderer sourcing patterns/rules.yml
  • Metrics harness capturing size, layer, and timing estimates (dry-run)
  • RunStore for reproducible artefact layout under runs/<timestamp>
  • Markdown and HTML reporter with diffs and rationale sections

API service

  • FastAPI application in api/server.py exposing /analyze, /rewrite, /run, /report
  • codi serve command launching uvicorn with configurable host/port
  • OpenAPI metadata aligned with PRD schemas and response contracts

Container packaging

  • Multi-stage docker/Dockerfile.slim (Python 3.12-slim, non-root codi user, AIRGAP=true)
  • docker/Dockerfile.complete extending Slim with offline LLM runtime, shared /work/runs volumes, and dual health checks
  • docker/runtime_complete.py orchestrator that boots LocalLLMServer before FastAPI

Security and environment

  • httpx air-gap guard enforcing zero outbound calls by default; AIRGAP_ALLOWLIST for selective access
  • Central CodiEnvironment config snapshot (core/config.py) with CLI/API/container toggle support
  • Security gates rejecting risky Dockerfile patterns (privileged, ADD http://, sudo)

LLM and RAG

  • SQLite-backed RAGIndex in core/store.py with cosine retrieval, persisted per run
  • Guarded LLMAssist functions generating summaries and template recommendations without emitting raw Dockerfiles
  • /llm/rank and /llm/explain API endpoints with schema-validated responses
  • QLoRA training pipeline for Qwen2.5-Coder-1.5B (training/qwen15b_lora/)
  • Adapter v0.1 metadata and packaging under models/adapters/qwen15b-lora-v0.1
  • LLM evaluation harness under eval/

CMD/ENTRYPOINT optimisation

  • core/cmd_parser.py and core/script_analyzer.py for deterministic CMD/ENTRYPOINT analysis
  • Schema-driven cmd_rewrites catalog in patterns/rules.yml with RulesCatalog selector
  • Renderer integration converting shell-form to exec-form with rationale comments
  • Full CLI/API/report surfacing of CMD analysis and per-run metadata

Dashboard

  • codi dashboard command aggregating runs to a JSON dataset
  • Static browser dashboard in docs/dashboard/ with run cards and stack aggregates

Release automation

  • .github/workflows/release-images.yml publishing codi:slim and codi:complete to GHCR
  • cosign keyless signatures + SPDX SBOM attestations
  • make release-images / make publish-images Makefile targets

Data pipeline (for LLM training)

  • GitHub Dockerfile collector, quality labelling, standardisation, and synthetic pair generation
  • Stratified train/val/test splits under data/splits/

πŸ§ͺ Technical Specifications

Component Technology Status
Language Python 3.12 βœ… Production-ready
CLI Framework Typer + Rich βœ… Operational
Renderer Jinja2 + policy guards βœ… Operational
CMD Rewrite Catalog YAML schema + RulesCatalog helper + renderer integration βœ… Operational
Build Runner Dry-run metrics estimator βœ… Operational
Reporter Markdown + handcrafted HTML βœ… Operational
API Framework FastAPI + Uvicorn βœ… Operational
Container Packaging Multi-stage Dockerfiles (Slim & Complete) βœ… Production-ready
Complete Runtime Launcher Python orchestrator (docker/runtime_complete.py) βœ… Operational
Container Runtime Dry-run heuristics (real BuildKit builds planned for v0.2) ⏳ In progress
Code Quality Ruff, Black, mypy βœ… Enforced via Makefile
Local LLM Server Threaded HTTP stub (core/llm.py) βœ… Assist-ready
RAG Memory SQLite-based RAGIndex with cosine retrieval βœ… Operational
LLM Assist Functions LLMAssist summary + template recommendation βœ… Integrated
Dashboard Aggregator core/dashboard.py + static viewer (docs/dashboard/) βœ… Operational
Air-gap Guard httpx outbound interceptor + env toggles βœ… Enforced
Environment Configuration core/config.py snapshots + environment metadata βœ… Operational
Testing pytest (49 tests incl. tests/test_rules.py) ℹ️ Run python3 -m pytest

Note: Size and layer metrics are heuristic estimates produced by the dry-run build runner β€” no real Docker builds are executed in the current release. Reported reductions reflect template-level analysis. Real BuildKit integration is planned for v0.2.

πŸš€ Quick Start

πŸ“˜ Prefer copy-paste commands? See docs/quickstart.md.

Prerequisites

  • Python 3.12+ (for local development)
  • Docker (for containerized deployment)
  • Make

Option 1: Local Development

git clone https://github.com/KiniunCorp/codi.git
cd codi
make setup                    # create .venv and install all dev dependencies
source .venv/bin/activate     # activate the virtual environment
make test                     # execute pytest suite

End-to-End Run + Report

# Execute deterministic optimisation against a project directory
codi run demo/node

# Generate human-readable report for the latest run folder
LATEST_RUN=$(ls -dt runs/* | head -n 1)
codi report "$LATEST_RUN"

# Reporter writes Markdown and HTML under runs/<id>/reports/

ℹ️ codi run emits "LLM Assist" and "CMD Summary" panels detailing template recommendations and applied CMD rewrites alongside metrics.

Inspect CMD rewrite comments

# Run the optimiser on the Node demo and capture the latest run directory
codi run demo/node
LATEST_RUN=$(ls -dt runs/* | head -n 1)

# Inspect promoted builder steps and CMD rewrite rationale comments
grep -n "CMD rewrite" "$LATEST_RUN"/candidates/*.Dockerfile
grep -n "RUN pip wheel" "$LATEST_RUN"/candidates/*.Dockerfile

Renderer outputs include CMD rewrite rationale comments and builder-stage promotions sourced from cmd_rewrites.

Run Smoke Validation

# Execute the automated smoke suite across Node/Python/Java demos
python3 -m pytest tests/test_smoke.py

CPU Sanity Check

python3 -m cli.main perf --out runs/perf --analysis-budget 5 --total-budget 180
cat runs/perf/cpu_perf_report.json | jq

Detailed guidance lives in docs/performance_cpu_sanity.md.

Dashboard Dataset & Viewer

# Aggregate runs into a dashboard-ready dataset
python3 -m cli.main dashboard \
  --runs docs/examples/dashboard \
  --export-json docs/dashboard/data/sample_runs.json \
  --relative-to docs/dashboard

# Serve the static dashboard (opens on http://127.0.0.1:8001)
python3 -m http.server --directory docs/dashboard 8001

The dashboard fetches JSON generated by codi dashboard and renders run cards, stack aggregates, and links to Markdown/HTML reports. See docs/dashboard.md for full instructions.

Exercise the Local LLM stub

python3 - <<'PY'
from core.llm import LocalLLMClient, LocalLLMServer

with LocalLLMServer() as server:
    client = LocalLLMClient(server.base_url)
    print(client.complete("Summarise CODI smoke test benefits."))
PY

Environment toggles & defaults

# Persist runs somewhere else without passing --out to every CLI command
export CODI_OUTPUT_ROOT="$HOME/codi-runs"

# Allow-lists make it easy to invoke the FastAPI test client while keeping AIRGAP enabled
export AIRGAP_ALLOWLIST="testserver,internal.example.com"

# Disable assist calls (fallback summaries still render) or point to a remote endpoint
export LLM_ENABLED=false
# export LLM_ENDPOINT="http://127.0.0.1:8081"

# Override the default rules file if you want to experiment with custom templates
# export RULES_PATH=/path/to/custom/rules.yml

# Run the suite afterwards to verify everything remains green
python3 -m pytest

Launch FastAPI Service

# Serve the CODI API (defaults: host=127.0.0.1, port=8000)
codi serve --host 0.0.0.0 --port 8000

# Analyze a project via HTTP
curl -X POST "http://localhost:8000/analyze" \
  -H 'Content-Type: application/json' \
  -d '{"project_path": "demo/node"}' | jq

# Generate a full run over the same project
curl -X POST "http://localhost:8000/run" \
  -H 'Content-Type: application/json' \
  -d '{"project_path": "demo/node"}' | jq

Option 2: Containerized Deployment

Build the Slim Container

# Build the multi-stage Slim container image
make build-slim

# Or directly with Docker
docker build -f docker/Dockerfile.slim -t codi:slim .

Build the Complete Container

# Build the Complete container image with embedded offline LLM runtime
make build-complete

# Or directly with Docker
docker build -f docker/Dockerfile.complete -t codi:complete .

Run as API Server (Default)

# Start the API server with volume mount
make run-slim

# Or directly with Docker
docker run --rm -it -v "$PWD:/work" -p 8000:8000 codi:slim

# Access the API at http://localhost:8000
curl http://localhost:8000/

Run the Complete Container (API + LLM)

# Start the Complete image with API, embedded LLM, and mounted model weights
docker run --rm -it \
  -v "$PWD:/work" \
  -v "$HOME/.codi-models:/models" \
  -e AIRGAP=true \
  -p 8000:8000 -p 8081:8081 \
  codi:complete

# Verify both services respond
curl http://localhost:8000/docs
curl http://localhost:8081/healthz

ℹ️ LLM_ENABLED, AIRGAP, and MODEL_MOUNT_PATH default to secure offline values; mount your own weight directory at /models (or set MODEL_MOUNT_PATH) to inject larger models without rebuilding the image. πŸ”’ Need selective outbound access? Provide AIRGAP_ALLOWLIST=internal.example.com (comma-separated) or disable temporarily with AIRGAP=false for controlled testing.

Tagged Releases & Verification

Automated GHCR publishing with provenance:

  1. Dry-run builds locally

    # Loads release-tagged images into Docker without pushing
    make release-images RELEASE_VERSION=v1.4.0 IMAGE_NAMESPACE=my-org/codi
  2. Publish from a workstation (requires docker login ghcr.io):

    make publish-images \
      RELEASE_VERSION=v1.4.0 \
      IMAGE_NAMESPACE=my-org/codi \
      REGISTRY=ghcr.io
  3. Tag + push (git tag v1.4.0 && git push origin v1.4.0) to invoke .github/workflows/release-images.yml, which:

    • Builds ghcr.io/<namespace>/codi-slim and ghcr.io/<namespace>/codi-complete
    • Publishes v1.4.0, latest, and digest tags
    • Generates SPDX SBOMs and uploads them as workflow artifacts
    • Signs images + SBOM attestations with cosign keyless (OIDC)
  4. Verify signatures anywhere

    OWNER=my-org
    REPO=codi
    IMAGE=ghcr.io/$OWNER/$REPO/codi-slim:v1.4.0
    
    cosign verify \
      --certificate-identity "https://github.com/${OWNER}/${REPO}/.github/workflows/release-images.yml@refs/tags/v1.4.0" \
      --certificate-oidc-issuer https://token.actions.githubusercontent.com \
      "$IMAGE"
    
    cosign verify-attestation \
      --type spdxjson \
      "$IMAGE"

    Substitute OWNER/REPO with your fork if publishing under a different org.

Refer to docs/runbook.md for the end-to-end release checklist, approval gates, and rollback plan.

Run CLI Commands

# Override the default entrypoint to run CLI commands
docker run --rm -v "$PWD:/work" codi:slim \
  codi all /work/demo/node --dry-run

# Or get an interactive shell with all CLI verbs available
make run-slim-cli

# Inside the container you can verify the installation
codi --version
codi report --in /work/runs/<latest>

πŸ’‘ Swap codi:slim for codi:complete to run the same CLI workflows with the embedded LLM assist enabled by default.

Example: Analyze a Project via Container

# Mount your project directory and analyze
docker run --rm -v "$PWD:/work" codi:slim \
  codi all /work/demo/node --dry-run

# Results are written to /work/runs/<timestamp>/
ls -la runs/

πŸ—ΊοΈ Roadmap

🎯 Epic A β€” CODI Core (Rules-Only)

  • CODI-MVP-001 β€” Initialise repo skeleton & Makefile
  • CODI-MVP-002 β€” Bootstrap CLI (Typer/Rich) with stubs
  • CODI-MVP-003 β€” Implement tolerant Dockerfile parser
  • CODI-MVP-004 β€” Implement stack detector (node/python/java)
  • CODI-MVP-005 β€” Seed patterns/rules.yml for 3 stacks
  • CODI-MVP-006 β€” Implement renderer (Jinja2 + policy guards)
  • CODI-MVP-007 β€” Build runner (BuildKit) + metrics capture
  • CODI-MVP-008 β€” Reporter (Markdown + HTML, diffs & rationale)
  • CODI-MVP-009 β€” Store module for runs/ artefacts
  • CODI-MVP-010 β€” Security & policy gates
  • CODI-MVP-011 β€” FastAPI service with 4 endpoints
  • CODI-MVP-012 β€” Slim container packaging
  • CODI-MVP-013 β€” Create minimal sample apps (3 stacks)
  • CODI-MVP-014 β€” End-to-end Slim smoke on 3 stacks
  • CODI-MVP-015 β€” Quickstart docs for Slim

πŸ€– Epic B β€” Local LLM Enhancement

  • CODI-MVP-016 β€” Integrate local LLM server (Ollama/llama.cpp)
  • CODI-MVP-017 β€” RAG store (SQLite/Chroma) + retrieval helper
  • CODI-MVP-018 β€” LLM-assist functions with strict boundaries
  • CODI-MVP-019 β€” Complete container packaging
  • CODI-MVP-020 β€” Airgap + model mount toggles

πŸ”„ Epic C β€” Unified Complete Container

  • CODI-MVP-021 β€” Env wiring & configuration
  • CODI-MVP-022 β€” CPU-only perf sanity tests
  • CODI-MVP-023 β€” Security & air-gap verification
  • CODI-MVP-024 β€” Models README & runbook
  • CODI-MVP-025 β€” Example runs + dashboard how-to

πŸŽ“ Supported Stacks

Stack Builder Base Runtime Base Status
Node.js / Next.js node:20-slim node:20-alpine βœ… Supported
Python / FastAPI python:3.12-slim python:3.12-slim βœ… Supported
Java / Spring Boot maven:3.9-eclipse-temurin-21 eclipse-temurin:21-jre βœ… Supported

πŸ“š Documentation

Full documentation suite β†’ docs/deliverables/docs/INDEX.md

The index covers installation, CLI usage, API reference, architecture, LLM module, rules guide, operations, security, CI/CD release, performance, and testing β€” organised by role.

Quick links

Guide Description
Installation & Setup Clone, venv, platform notes
CLI Guide All commands, env flags, workflows
API Guide FastAPI endpoints, schemas, examples
Architecture System diagram, module deep-dive
Slim Container Build and run codi:slim
Complete Container Embedded LLM runtime, adapter mounts
LLM Module Data pipeline, training, evaluation
Rules Guide Template authoring, CMD rewrites
Operations Runbook Day-2 health checks, troubleshooting
Security Air-gap controls, container hardening
CI/CD & Release Signing, SBOMs, rollback
Performance Budgets, codi perf, tips
Reference Commands, schemas, glossary, roadmap

Product spec

  • PRD β€” Full MVP specification
  • Task plan β€” Engineering breakdown

πŸ” Security & Privacy

  • Air-gapped by default β€” no external calls in the rules-only pipeline
  • Template-based rendering guarded by security policies and allowlists
  • Reporter embeds policy notes and rationale for every candidate
  • Build runner operates in dry-run mode until BuildKit integration lands
  • Air-gap guard blocks outbound HTTP(S) when AIRGAP=true; optionally set AIRGAP_ALLOWLIST for vetted internal hosts

πŸ† Success Metrics (MVP Goals)

Metric Target Status
Median size reduction β‰₯40% Pending BuildKit integration
Syntactically valid candidates 100% βœ… Enforced via parser + policy checks
Report generation Every run βœ… Automated
Offline operation 0 outbound calls βœ… Enforced by air-gap guard
Analysis performance ≀3s (no build) βœ… CPU dry-run suite via codi perf
Full run performance ≀5m per stack βœ… Dry-run pipeline <0.01s; real builds forthcoming

🀝 Contributing

Contributions are welcome! Development follows the task plan in docs/codi_mvp_tasks.md.

  1. Pick a task from the roadmap or open an issue
  2. Create a feature branch
  3. Implement with tests (python3 -m pytest)
  4. Run make lint + make test
  5. Submit a PR referencing the task ID

πŸ“ License

MIT License β€” see LICENSE for details.

πŸ™‹ Support & Resources

  • PRD: docs/codi_mvp_prd.md
  • Tasks: docs/codi_mvp_tasks.md
  • Issues: Track progress and report bugs via repository issues

Project Status: βœ… MVP Active β€” rules pipeline, CMD optimisation, local LLM assist, and release automation shipped
Next Milestone: Public launch, real BuildKit build integration, CHANGELOG
Last Updated: 2026-05-04

About

CODI: Container Dietitian -> Rules-first, AI-assisted Dockerfile optimizer

Topics

Resources

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Contributors