This document describes the architecture of pycontainer-build, a native Python container image builder that creates OCI-compliant images without requiring Docker or Dockerfiles.
┌─────────────────────────────────────────────────────────────────┐
│ User Interface Layer │
├─────────────────────────────────────────────────────────────────┤
│ CLI (cli.py) │ Python API │ Plugins │
│ - argparse │ - BuildConfig │ - Poetry │
│ - Command handling │ - ImageBuilder │ - Hatch │
│ - Output formatting │ - Programmatic API │ - azd │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Orchestration Layer │
├─────────────────────────────────────────────────────────────────┤
│ ImageBuilder (builder.py) │
│ - Coordinates build process │
│ - Manages build phases (discover → pack → generate → output) │
│ - Handles caching and optimization │
└─────────────────────────────────────────────────────────────────┘
│
┌─────────────┼─────────────┐
▼ ▼ ▼
┌───────────────────┐ ┌──────────────┐ ┌─────────────────┐
│ Project Layer │ │ OCI Layer │ │ Registry Layer │
├───────────────────┤ ├──────────────┤ ├─────────────────┤
│ project.py │ │ oci.py │ │ registry_client │
│ - Pyproject.toml │ │ - Manifest │ │ - Push/Pull │
│ - Entry points │ │ - Config │ │ - Auth │
│ - Dependencies │ │ - Layers │ │ - Blob upload │
│ - Framework │ │ - Index │ │ - V2 API │
│ detection │ │ - Digests │ │ │
└───────────────────┘ └──────────────┘ └─────────────────┘
│ │ │
└───────────────────┼───────────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ Foundation Layer │
├─────────────────────────────────────────────────────────────────┤
│ fs_utils.py │ config.py │ cache.py │
│ - File iteration │ - BuildConfig │ - Blob cache │
│ - Tar creation │ - Validation │ - Layer reuse │
│ - Hashing │ - TOML parsing │ - Eviction policy │
│ - Path handling │ - Serialization │ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Output Targets │
├─────────────────────────────────────────────────────────────────┤
│ Local Layout │ Container Registry │ OCI Artifacts │
│ - dist/image/ │ - GHCR │ - SBOM │
│ - Blobs, manifest │ - ACR │ - Signatures │
│ - Index, refs │ - Docker Hub │ │
└─────────────────────────────────────────────────────────────────┘
Purpose: Command-line interface for building container images.
Responsibilities:
- Parse command-line arguments using
argparse - Validate user inputs
- Create
BuildConfigfrom CLI flags - Invoke
ImageBuilderwith config - Display build progress and results
- Handle errors gracefully
Key Functions:
def main():
"""Entry point for pycontainer CLI."""
parser = create_argument_parser()
args = parser.parse_args()
config = BuildConfig.from_args(args)
builder = ImageBuilder(config)
try:
builder.build()
if args.push:
builder.push()
except BuildError as e:
print(f"Error: {e}")
sys.exit(1)CLI Arguments:
--tag: Image tag (e.g.,myapp:latest)--context: Build context path--workdir: Container working directory--env: Environment variables (repeatable)--base-image: Base image (Phase 2)--push: Push to registry after build (Phase 1)--registry: Registry URL--verbose: Enable verbose logging
Purpose: Coordinates the entire build process from project discovery to image output.
Class Structure:
class ImageBuilder:
def __init__(self, config: BuildConfig):
self.config = config
self.output_dir = Path("dist/image")
self.cache = BlobCache() if config.use_cache else None
def build(self) -> Path:
"""Build OCI image and return path to output."""
# Phase 1: Discover project structure
metadata = discover_project(self.config.context_path)
# Phase 2: Collect files to include
files = collect_files(self.config.context_path,
self.config.include_paths)
# Phase 3: Create layer tar
layer_path = create_layer_tar(files, self.config.workdir)
layer_digest = hash_file(layer_path)
# Phase 4: Generate OCI structures
config_json = build_config_json(self.config, metadata)
config_digest = write_blob(config_json, self.output_dir)
layer_blob_path = move_to_blob_store(layer_path, layer_digest)
manifest = build_manifest(config_digest, layer_digest)
write_manifest(manifest, self.output_dir)
return self.output_dir
def push(self, registry: str = None):
"""Push built image to registry."""
# Phase 1 feature
passBuild Process Flow:
- Discovery: Read
pyproject.toml, detect entry point, find source files - Collection: Gather all files to include in image
- Packing: Create tar archive with proper paths and permissions
- OCI Generation: Create config and manifest JSON
- Output: Write blobs and manifest to disk (or registry)
Purpose: Introspect Python projects to extract metadata, entry points, structure, and Python version.
Key Functions:
def detect_python_version(context_dir) -> str:
"""
Detect Python version from pyproject.toml requires-python field.
Extracts version from patterns like:
- ">=3.11" → "3.11"
- "^3.12" → "3.12"
- "~=3.10" → "3.10"
Returns:
Python version string (e.g., "3.11"), defaults to "3.11" if not found
"""
pyproject = parse_pyproject_toml(context_dir / "pyproject.toml")
requires_py = pyproject.get("project", {}).get("requires-python")
if requires_py:
match = re.search(r'(\d+\.\d+)', requires_py)
if match:
return match.group(1)
return "3.11"
def discover_project(context_path: Path) -> ProjectMetadata:
"""
Discover Python project structure and metadata.
Returns:
ProjectMetadata with name, version, entry_point, include_paths
"""
pyproject = parse_pyproject_toml(context_path / "pyproject.toml")
entry_point = detect_entry_point(pyproject)
include_paths = detect_include_paths(context_path)
framework = detect_framework(context_path)
return ProjectMetadata(
name=pyproject.get("project", {}).get("name"),
version=pyproject.get("project", {}).get("version"),
entry_point=entry_point,
include_paths=include_paths,
framework=framework
)
def detect_entry_point(pyproject: dict) -> list[str]:
"""
Detect entry point from pyproject.toml [project.scripts].
Converts script like "myapp = "myapp.cli:main"" to:
["python", "-m", "myapp.cli"]
Falls back to ["python", "-m", "app"] if no script found.
"""
scripts = pyproject.get("project", {}).get("scripts", {})
if not scripts:
return ["python", "-m", "app"]
# Get first script entry
script_name, script_target = next(iter(scripts.items()))
module_path = script_target.split(":")[0]
return ["python", "-m", module_path]
def detect_include_paths(context_path: Path) -> list[str]:
"""
Auto-detect paths to include in image.
Looks for:
- src/ directory
- app/ directory
- Package directory (name matching pyproject.toml name)
- Always includes: pyproject.toml, requirements.txt
"""
paths = []
if (context_path / "src").exists():
paths.append("src/")
elif (context_path / "app").exists():
paths.append("app/")
# Add essential files
for file in ["pyproject.toml", "requirements.txt", "README.md"]:
if (context_path / file).exists():
paths.append(file)
return pathsData Structures:
@dataclass
class ProjectMetadata:
name: str
version: str
entry_point: list[str]
include_paths: list[str]
framework: str | None # "fastapi", "flask", "django", NonePurpose: Implement OCI Image Specification structures (manifest, config, layers).
OCI Structures:
@dataclass
class OCIDescriptor:
"""OCI Content Descriptor."""
mediaType: str
digest: str
size: int
@dataclass
class OCIManifest:
"""OCI Image Manifest v1."""
schemaVersion: int = 2
mediaType: str = "application/vnd.oci.image.manifest.v1+json"
config: OCIDescriptor
layers: list[OCIDescriptor]
def to_json(self) -> str:
return json.dumps(asdict(self), indent=2)
@dataclass
class OCIConfig:
"""OCI Image Config."""
architecture: str = "amd64"
os: str = "linux"
config: dict # Env, Cmd, WorkingDir, etc.
rootfs: dict # {"type": "layers", "diff_ids": [...]}
history: list[dict]
def to_json(self) -> str:
return json.dumps(asdict(self), indent=2)
@dataclass
class OCIIndex:
"""OCI Image Index (manifest list for multi-arch)."""
schemaVersion: int = 2
mediaType: str = "application/vnd.oci.image.index.v1+json"
manifests: list[OCIDescriptor]Platform Support:
def parse_platform(platform: str) -> Tuple[str, str]:
"""Parse platform string (e.g., 'linux/amd64') into (os, arch)."""
parts = platform.split('/')
if len(parts) != 2:
raise ValueError(f"Invalid platform format: {platform}")
return parts[0], parts[1] # (os, architecture)The builder uses this to:
- Parse the
--platformflag into OS and architecture - Select the correct manifest from multi-platform base images
- Generate OCI config and index with proper platform metadata
Key Functions:
def build_manifest(config_desc: OCIDescriptor,
layer_descs: list[OCIDescriptor]) -> OCIManifest:
"""Create OCI manifest from config and layer descriptors."""
return OCIManifest(
config=config_desc,
layers=layer_descs
)
def build_config_json(architecture: str, os_name: str,
build_config: BuildConfig,
metadata: ProjectMetadata) -> OCIConfig:
"""
Create OCI config JSON from build configuration.
Includes:
- Architecture and OS (from --platform flag)
- Env vars (from build_config.env)
- Cmd (from metadata.entry_point)
- WorkingDir (from build_config.workdir)
- Labels (from build_config.labels)
"""
return OCIConfig(
architecture=architecture,
os=os_name,
config={
"Env": [f"{k}={v}" for k, v in build_config.env.items()],
"Cmd": metadata.entry_point,
"WorkingDir": build_config.workdir,
"Labels": build_config.labels
},
rootfs={
"type": "layers",
"diff_ids": [f"sha256:{digest}" for digest in layer_digests]
},
history=[
{"created_by": "pycontainer-build"}
]
)Purpose: Interact with container registries using Docker Registry v2 API.
Class Structure:
class RegistryClient:
def __init__(self, registry: str, auth: AuthProvider = None):
self.registry = registry
self.auth = auth
self.session = requests.Session()
def push_blob(self, data: bytes, digest: str) -> bool:
"""
Push a blob (layer or config) to registry.
Process:
1. POST /v2/<name>/blobs/uploads/ (initiate upload)
2. PUT /v2/<name>/blobs/uploads/<uuid>?digest=<digest> (complete)
"""
# Initiate upload
response = self.session.post(
f"https://{self.registry}/v2/{name}/blobs/uploads/",
headers={"Authorization": f"Bearer {self.auth.get_token()}"}
)
upload_url = response.headers["Location"]
# Complete upload
response = self.session.put(
f"{upload_url}?digest={digest}",
data=data,
headers={
"Content-Type": "application/octet-stream",
"Content-Length": str(len(data))
}
)
return response.status_code == 201
def push_manifest(self, manifest: OCIManifest, tag: str) -> bool:
"""Push manifest to registry with specific tag."""
response = self.session.put(
f"https://{self.registry}/v2/{name}/manifests/{tag}",
data=manifest.to_json(),
headers={
"Content-Type": "application/vnd.oci.image.manifest.v1+json",
"Authorization": f"Bearer {self.auth.get_token()}"
}
)
return response.status_code == 201
def blob_exists(self, digest: str) -> bool:
"""Check if blob already exists in registry (for caching)."""
response = self.session.head(
f"https://{self.registry}/v2/{name}/blobs/{digest}"
)
return response.status_code == 200Authentication:
class AuthProvider:
"""Base class for registry authentication."""
def get_token(self) -> str:
raise NotImplementedError
class BearerTokenAuth(AuthProvider):
"""OAuth2 Bearer Token authentication."""
def __init__(self, username: str, password: str, registry: str):
self.token = self._exchange_token(username, password, registry)
def get_token(self) -> str:
return self.token
def _exchange_token(self, username, password, registry):
# Implement OAuth2 token exchange
pass
class DockerConfigAuth(AuthProvider):
"""Read credentials from ~/.docker/config.json."""
def __init__(self, registry: str):
self.credentials = self._load_docker_config(registry)
def get_token(self) -> str:
return base64.b64decode(self.credentials).decode()Purpose: File system utilities for iteration, tar creation, hashing.
Key Functions:
def iter_files(base_path: Path,
include_patterns: list[str]) -> Iterator[tuple[Path, Path]]:
"""
Iterate over files matching include patterns.
Yields:
(absolute_path, relative_path) tuples
"""
for pattern in include_patterns:
for path in base_path.glob(pattern):
if path.is_file():
yield path, path.relative_to(base_path)
def create_tar(files: list[tuple[Path, Path]],
workdir: str) -> Path:
"""
Create tar archive with files prefixed by workdir.
Example: src/app.py → /app/src/app.py in tar
"""
tar_path = Path("app-layer.tar")
with tarfile.open(tar_path, "w") as tar:
for abs_path, rel_path in files:
arcname = f"{workdir.lstrip('/')}/{rel_path}"
tar.add(abs_path, arcname=arcname)
return tar_path
def hash_file(path: Path) -> str:
"""Compute SHA256 digest of file."""
sha256 = hashlib.sha256()
with path.open("rb") as f:
for chunk in iter(lambda: f.read(8192), b""):
sha256.update(chunk)
return sha256.hexdigest()Purpose: Build configuration and validation.
@dataclass
class BuildConfig:
"""Configuration for container image build."""
tag: str
context_path: Path
workdir: str = "/app"
env: dict[str, str] = field(default_factory=dict)
include_paths: list[str] = field(default_factory=list)
base_image: str = "python:3.11-slim" # Auto-detected from requires-python
registry: str | None = None
use_cache: bool = True
def __post_init__(self):
self.context_path = Path(self.context_path)
if not self.context_path.exists():
raise ValueError(f"Context path not found: {self.context_path}")
if not self.include_paths:
# Auto-detect include paths
self.include_paths = default_include_paths(self.context_path)
@classmethod
def from_toml(cls, toml_path: Path) -> "BuildConfig":
"""Load config from pycontainer.toml file."""
import tomllib
with toml_path.open("rb") as f:
data = tomllib.load(f)
return cls(**data["build"])Purpose: Layer and blob caching for fast incremental builds.
class BlobCache:
def __init__(self, cache_dir: Path = None):
self.cache_dir = cache_dir or Path.home() / ".pycontainer/cache"
self.cache_dir.mkdir(parents=True, exist_ok=True)
def get(self, digest: str) -> Path | None:
"""Get cached blob by digest."""
blob_path = self.cache_dir / "blobs/sha256" / digest
return blob_path if blob_path.exists() else None
def put(self, digest: str, data: bytes) -> Path:
"""Store blob in cache."""
blob_path = self.cache_dir / "blobs/sha256" / digest
blob_path.parent.mkdir(parents=True, exist_ok=True)
blob_path.write_bytes(data)
return blob_path
def evict_lru(self, max_size_mb: int = 1000):
"""Evict least recently used blobs to stay under size limit."""
# Implement LRU eviction policy
pass1. User runs: pycontainer build --tag myapp:latest
2. CLI parses args → BuildConfig(tag="myapp:latest", context_path=".")
3. ImageBuilder.build():
├─ discover_project() → ProjectMetadata
├─ collect_files() → [(abs_path, rel_path), ...]
├─ create_tar() → app-layer.tar
├─ hash_file() → layer_digest
├─ build_config_json() → OCIConfig
├─ build_manifest() → OCIManifest
└─ write_output() → dist/image/
├─ manifest.json
└─ blobs/sha256/
├─ <config-digest>
└─ <layer-digest>
4. Output: dist/image/ (OCI image layout)
1. User runs: pycontainer build --tag myapp:latest --push
2. ImageBuilder.build() → dist/image/
3. ImageBuilder.push():
├─ registry_client.push_blob(config_json)
├─ registry_client.push_blob(layer_tar)
└─ registry_client.push_manifest(manifest, tag)
4. Output: Image pushed to ghcr.io/user/myapp:latest
1. BuildConfig(base_image="python:3.11-slim")
2. ImageBuilder.build():
├─ registry_client.pull_manifest("python:3.11-slim")
├─ registry_client.pull_layers([layer1, layer2, ...])
├─ parse_base_config() → base_env, base_workdir
├─ merge_configs(base_config, user_config)
├─ create_app_layer()
└─ build_manifest([base_layers..., app_layer])
3. Output: Multi-layer image with base + app
All configuration uses Python dataclasses for type safety and validation:
@dataclass
class BuildConfig:
tag: str
context_path: Path
# ... validates on construction
config = BuildConfig(tag="app:v1", context_path="/invalid/path")
# Raises ValueError: Context path not foundCore operations are pure functions (no side effects):
def build_manifest(config_desc: OCIDescriptor,
layer_descs: list[OCIDescriptor]) -> OCIManifest:
# Pure function: same inputs → same output
return OCIManifest(config=config_desc, layers=layer_descs)Components composed rather than inherited:
class ImageBuilder:
def __init__(self, config: BuildConfig):
self.config = config
self.registry_client = RegistryClient(config.registry) if config.registry else None
self.cache = BlobCache() if config.use_cache else NoneAll dependencies passed explicitly (dependency injection):
def discover_project(context_path: Path,
pyproject_parser: Callable = parse_pyproject_toml):
# Dependency injectable for testing
pyproject = pyproject_parser(context_path / "pyproject.toml")Rationale: Enable container builds in environments without Docker:
- GitHub Codespaces (Docker not pre-installed)
- Dev Box / Cloud Dev Environments
- CI systems without Docker (faster startup)
- Locked-down corporate laptops
Approach: Implement OCI spec directly using Python stdlib + HTTP requests.
Rationale: Simplify user experience by automatically selecting the correct Python base image from project metadata.
Approach: Parse requires-python from pyproject.toml and construct base image name (e.g., >=3.11 → python:3.11-slim).
Benefits:
- Zero configuration for common cases
- Always includes Python runtime (no invalid app-only images)
- Respects project's Python version requirements
- Users can still override with
--base-imageflag
Rationale: Type safety, autocompletion, validation.
Example:
# Dict (error-prone)
config = {"tag": "app:v1", "context": "/path"}
# Dataclass (type-safe)
config = BuildConfig(tag="app:v1", context_path="/path")Rationale: Minimize configuration, match .NET SDK experience.
Balance: Auto-detect defaults, allow explicit overrides.
- Implement base image parser in
registry_client.py - Add layer merging logic in
builder.py - Update
oci.pyto handle multi-layer manifests
-
Add framework detection in
project.py:def detect_fastapi(context_path: Path) -> bool: # Look for "from fastapi import FastAPI" pass
-
Add entrypoint generation in
oci.py:def fastapi_entrypoint(module: str) -> list[str]: return ["uvicorn", f"{module}:app", "--host", "0.0.0.0"]
-
Implement auth provider in
registry_client.py:class AzureContainerRegistryAuth(AuthProvider): def get_token(self) -> str: # Use Azure CLI credentials pass
-
Register in factory:
AUTH_PROVIDERS = { "ghcr.io": GitHubTokenAuth, "azurecr.io": AzureContainerRegistryAuth, }
| Operation | Time | Bottleneck |
|---|---|---|
| Project discovery | <100ms | Disk I/O (read pyproject.toml) |
| File collection | ~500ms | Disk I/O (iterate files) |
| Tar creation | ~1s | Disk I/O (write tar) |
| Hash calculation | ~500ms | CPU (SHA256) |
| JSON generation | <50ms | CPU (serialize) |
| Total | ~2.5s | Disk I/O |
- Parallel Hashing: Hash files concurrently (ThreadPoolExecutor)
- Incremental Tar: Only re-pack changed files
- Layer Caching: Skip tar creation if content unchanged
- Blob Streaming: Stream large layers to registry (avoid disk write)
- No network communication (local-only)
- File permissions preserved in tar
- No external dependencies (pure stdlib)
- Auth: Support OAuth2, Basic Auth, token-based
- TLS: HTTPS required for registry communication
- Credentials: Read from ~/.docker/config.json (never log)
- Blob Integrity: Verify SHA256 digest on upload
- Image Verification: Validate base image signatures
- Supply Chain: SBOM generation (Phase 4)
- Minimal Attack Surface: Prefer distroless base images
┌──────────────┐
│ ImageBuilder │
└──────┬───────┘
│
├──> LocalOutput (dist/image/)
└──> RegistryClient (push to registry)
┌──────────────┐
│ ImageBuilder │
└──────┬───────┘
│
├──> BaseImagePuller (fetch base layers)
├──> LayerMerger (merge base + app)
└──> Output (multi-layer image)
┌──────────────┐
│ ImageBuilder │
└──────┬───────┘
│
├──> PreBuildHook (e.g., run tests)
├──> PostBuildHook (e.g., security scan)
└──> CustomLayerProvider (e.g., SBOM layer)
| Feature | .NET SDK | pycontainer-build | Status |
|---|---|---|---|
| No Dockerfile | ✅ | ✅ | Complete |
| No Docker daemon | ✅ | ✅ | Complete |
| Auto-detect entry point | ✅ | ✅ | Complete |
| Base image support | ✅ | 🔜 | Phase 2 |
| Push to registry | ✅ | 🔜 | Phase 1 |
| Multi-arch builds | ✅ | 🔜 | Phase 4 |
| Layer caching | ✅ | 🔜 | Phase 1 |
- OCI Image Spec: https://github.com/opencontainers/image-spec/blob/main/spec.md
- OCI Image Layout: https://github.com/opencontainers/image-spec/blob/main/image-layout.md
- Docker Registry v2 API: https://docs.docker.com/registry/spec/api/
- .NET SDK Containers: https://github.com/dotnet/sdk-container-builds
Last Updated: 2025-11-19
Document Version: 1.0
Maintainers: pycontainer-build team