This document establishes coding guidelines and design principles for tools in this repository.
Tools in this repository follow the UNIX philosophy:
- Do one thing well - Each tool has a focused purpose
- Compose via pipes and files - Tools read/write JSON, YAML, plain text
- Text streams are universal interface - Prefer text over binary formats
- Small is beautiful - Favor simple implementations over clever ones
- Fail fast, fail loud - Clear error messages, non-zero exit codes
✓ speaker_detection (no extension, executable)
✓ speaker_samples (no extension, executable)
✗ speaker_detection.py (avoid .py for CLI tools)
Why no .py extension?
- Cleaner invocation:
./speaker_detectionvspython3 speaker_detection.py - Signals "this is a command" not "this is a library"
- Matches UNIX convention (
ls,grep,git)
Required setup:
chmod +x tool_name # Make executableRequired shebang:
#!/usr/bin/env python3Tools should work both as standalone commands AND as importable modules:
#!/usr/bin/env python3
"""
tool_name - Brief description
Usage:
tool_name command [options]
Environment:
SOME_VAR - Description (default: value)
"""
import argparse
import sys
# ----------------------------------------------------------------------
# Core Functions (importable)
# ----------------------------------------------------------------------
def do_something(input_path, options=None):
"""
Process input and return result.
Can be called directly when imported as library.
"""
# Implementation
return result
def another_function(data):
"""Another reusable function."""
pass
# ----------------------------------------------------------------------
# CLI Commands
# ----------------------------------------------------------------------
def cmd_action(args):
"""CLI handler for 'action' command."""
result = do_something(args.input, args.options)
print(json.dumps(result, indent=2))
return 0
# ----------------------------------------------------------------------
# Main (CLI entry point)
# ----------------------------------------------------------------------
def main():
parser = argparse.ArgumentParser(description="Tool description")
subparsers = parser.add_subparsers(dest="command", required=True)
# ... parser setup ...
args = parser.parse_args()
return args.func(args)
if __name__ == "__main__":
sys.exit(main())Benefits:
- Test core logic without subprocess spawning
- Compose tools programmatically
- Reuse functions across tools
Import example:
from speaker_detection import load_speaker, compute_b3sum
from speaker_samples import extract_segmentsUse comment banners to organize code:
# ----------------------------------------------------------------------
# Configuration
# ----------------------------------------------------------------------
# ----------------------------------------------------------------------
# Core Functions (importable)
# ----------------------------------------------------------------------
# ----------------------------------------------------------------------
# CLI Commands
# ----------------------------------------------------------------------
# ----------------------------------------------------------------------
# Main
# ----------------------------------------------------------------------Tools that share data use a common directory specified by environment variable:
$SPEAKERS_EMBEDDINGS_DIR/ # Default: ~/.config/speakers_embeddings
├── config.json # Global settings (optional)
├── db/ # Structured data (JSON per entity)
│ ├── alice.json
│ └── bob.json
├── embeddings/ # Binary/opaque data by entity
│ ├── alice/
│ └── bob/
└── samples/ # Extracted artifacts by entity
├── alice/
│ ├── sample-001.mp3
│ └── sample-001.meta.yaml # Sidecar metadata
└── bob/
Principles:
- One JSON file per entity - Easy to inspect,
jq-queryable, git-diffable - Sidecar metadata -
file.ext+file.meta.yamlfor provenance - Content-addressable where possible - Use hashes (b3sum/sha256) as identifiers
SCHEMA_VERSION = 1 # Bump on breaking changes
def create_entity(id: str, ...) -> dict:
"""Create entity with standard fields."""
now = datetime.now(timezone.utc).isoformat()
return {
"id": id,
"version": SCHEMA_VERSION,
# ... entity-specific fields ...
"created_at": now,
"updated_at": now,
}For artifacts (audio, images), store provenance in .meta.yaml:
version: 2
artifact_id: sample-001
b3sum: abc123... # Content hash for verification
source:
file: /path/to/source.mp3
b3sum: xyz789... # Source content hash
extraction:
tool: tool_name
tool_version: 1.0.0
extracted_at: 2026-01-16T10:30:00Zreturn 0 # Success
return 1 # Error (general)
return 2 # Usage error (bad arguments)# Results go to stdout (for piping)
print(json.dumps(result, indent=2))
# Status/progress goes to stderr (for humans)
print("Processing...", file=sys.stderr)
# Errors go to stderr
print(f"Error: {message}", file=sys.stderr)Support multiple formats via --format:
if args.format == "json":
print(json.dumps(data, indent=2))
elif args.format == "table":
print(format_table(data))
elif args.format == "ids":
print("\n".join(item["id"] for item in data))Every script MUST check environment variables before using defaults. This enables test isolation:
# CORRECT - env var first, then default
DEFAULT_DIR = os.path.expanduser("~/.config/tool_name")
def get_data_dir() -> Path:
return Path(os.environ.get("TOOL_NAME_DIR", DEFAULT_DIR))
# INCORRECT - hardcoded (breaks test isolation!)
def get_data_dir() -> Path:
return Path.home() / ".config" / "tool_name" # BAD!Why this matters:
- Tests set
SPEAKERS_EMBEDDINGS_DIR=/tmp/test_$$for isolation - CI/CD can point to ephemeral directories
- Multiple test runs don't interfere
TOOL_NAME_SETTING # Tool-specific
SPEAKERS_EMBEDDINGS_DIR # Shared across speaker tools
SPEAKER_DETECTION_DEBUG # Debug flags
| Variable | Description | Default | Used By |
|---|---|---|---|
SPEAKERS_EMBEDDINGS_DIR |
Base directory for speaker data (profiles, embeddings, samples) | ~/.config/speakers_embeddings |
speaker_detection, speaker_samples |
SPEAKER_DETECTION_BACKEND |
Default embedding backend when --backend not specified |
speechmatics |
speaker_detection |
SPEAKER_BACKENDS_CONFIG |
Path to YAML config file defining available backends | speaker_detection_backends/backends.yaml |
speaker_detection_backends/base.py |
SPEECHMATICS_API_KEY |
API key for Speechmatics speech-to-text and speaker ID services | (required) | speaker_detection_backends/speechmatics_backend.py |
SPEAKER_DETECTION_DEBUG |
Enable debug logging for speaker identification (any non-empty value) | (disabled) | speaker_detection_backends/speechmatics_backend.py |
if not path.exists():
print(f"Error: File not found: {path}", file=sys.stderr)
return 1def validate_id(id: str) -> bool:
"""IDs: lowercase alphanumeric with hyphens/underscores."""
import re
return bool(re.match(r"^[a-z0-9][a-z0-9_-]*$", id))See evals/TESTING.md for comprehensive testing documentation.
- Reproducible audio - Use espeak-ng for synthetic voices
- Isolated directories - Always use env vars, never hardcode paths
- Docker-first - All tests runnable in
evals/Dockerfile.test
evals/
├── TESTING.md # Comprehensive testing docs
├── Dockerfile.test # Reproducible test environment
└── tool_name/
├── Makefile # Audio generation
├── audio/ # Generated audio (gitignored)
├── samples/ # Reference transcripts
├── test_cli.py # CLI integration tests
├── benchmark.py # Performance benchmarks
└── test_all.sh # Run all tests
# Local (requires espeak-ng, ffmpeg)
cd evals/speaker_detection
make all # Generate test audio
./test_all.sh # Run tests
# Docker (reproducible)
docker build -f evals/Dockerfile.test -t speaker-tools-test .
docker run --rm speaker-tools-test# In test_cli.py - set env BEFORE imports!
import os
import tempfile
TEST_DIR = tempfile.mkdtemp(prefix="test_")
os.environ["SPEAKERS_EMBEDDINGS_DIR"] = TEST_DIR
# Now import
import sys
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
from speaker_detection import load_speaker, compute_b3sumEvery tool has a module docstring with:
- One-line description
- Usage examples
- Environment variables
tool_name.README.md
ramblings/YYYY-MM-DD--topic-name.md
Use Mermaid diagrams for visual documentation.
Prefer blake3 (b3sum) for content hashing:
def compute_b3sum(file_path: Path) -> str:
"""Compute blake3 hash, fallback to sha256."""
try:
result = subprocess.run(
["b3sum", "--no-names", str(file_path)],
capture_output=True, check=True, text=True,
)
return result.stdout.strip()[:32]
except (subprocess.CalledProcessError, FileNotFoundError):
# Fallback to sha256
h = hashlib.sha256()
with open(file_path, "rb") as f:
for chunk in iter(lambda: f.read(8192), b""):
h.update(chunk)
return h.hexdigest()[:32]Why blake3?
- Fast (SIMD-optimized)
- Content-addressable like git
b3sumCLI widely available
./speaker_samples segments -t transcript.json -l S1 | \
jq -r '.start, .end' | \
xargs -n2 ./process_segmentfrom speaker_detection import load_speaker, get_samples_by_source_audio
from speaker_samples import extract_segment
profile = load_speaker("alice")
samples = get_samples_by_source_audio("alice", audio_b3sum)# Extract → Review → Enroll
./speaker_samples extract audio.mp3 -t transcript.json -l S1 -s alice
./speaker_samples review alice sample-001 --approve
./speaker_detection enroll alice audio.mp3 -t transcript.json -l S1- No
.pyextension, has shebang#!/usr/bin/env python3 -
chmod +xapplied - Module docstring with usage and env vars
- Core functions separate from CLI handlers
- Works as importable library
- Uses shared storage conventions if applicable
- Outputs JSON to stdout, status to stderr
- Returns proper exit codes
- Has
{tool_name}.README.md - Tests in
evals/{tool_name}/