Skip to content
@jmweb-org

jmweb-org

AI, machine learning and MLOps projects

jmweb-org

Small, well-made command-line tools for the daily workflow of machine learning and AI engineers. Each does one thing, runs offline, returns honest exit codes, and drops into CI. No services, no accounts, nothing to administer.

Data

  • dsdiff — A git-style diff between two dataset files: schema changes plus column-level distribution drift (PSI), with a CI gate.
  • splitcheck — Detect rows that leak between train, validation and test splits, exact and after normalization, and fail CI when they do.
  • pii-sweep — Scan dataset files for personally identifiable information, with a confidence per column and a CI gate, before the data leaves your hands.

Environment and reproducibility

  • mlenv — Snapshot the full ML stack (Python, CUDA, driver, torch build, GPUs, env vars) to one file and diff two snapshots.
  • repro-manifest — Capture a portable manifest of a run's environment, code, config and seeds, then diff two manifests to explain why two runs differed.
  • gpu-gate — Wait for a free GPU, claim it, set CUDA_VISIBLE_DEVICES, and run your command. The wait-pick-export-run loop for shared multi-GPU boxes without a cluster scheduler.

Evaluation

  • evalgate — Decide whether an eval delta is a real regression or sampling noise, and fail CI only when it is real.
  • slicemap — Find the data slices where a new model regressed against an old one, ranked by how many rows are affected.

Serving and cost

  • servectl — Serve a model file over HTTP in one command, with health and Prometheus metrics built in.
  • tokenmeter — Count tokens and estimate cost for prompts before you send them, from the command line or as a CI budget gate.

Design

The tools share a deliberate shape. Each is a single command that takes a file or two and returns a verdict: a readable terminal view, --json for machines, and a meaningful exit code. Most ship a --check mode that fails CI on the change that matters, so they slot into a pipeline or a pre-commit hook without adopting a platform. They are offline-first and dependency-light, and each is small enough for one engineer to maintain.

More

Machine learning and MLOps work, and the rest of these projects, at jmwebsoluciones.com.

Popular repositories Loading

  1. evalgate evalgate Public

    Decide whether an eval delta is a real regression or sampling noise.

    Python 1 1

  2. .github .github Public

    Organization profile

  3. gpu-gate gpu-gate Public

    Wait for a free GPU, claim it, and run a command on it.

    Python

  4. mlenv mlenv Public

    Snapshot the machine learning environment and diff two snapshots.

    Python

  5. dsdiff dsdiff Public

    Diff two dataset files: schema changes plus column-level distribution drift.

    Python

  6. repro-manifest repro-manifest Public

    Capture a reproducibility manifest for a run and diff two manifests.

    Python

Repositories

Showing 10 of 11 repositories

Top languages

Loading…

Most used topics

Loading…