Checkpoint config-driven env GRPO eval work by ProfSynapse · Pull Request #92 · ProfSynapse/Synaptic-Tuner

ProfSynapse · 2026-05-05T18:15:03Z

Summary:

Adds the current workspace multistep GRPO projection and refreshed lean SFT datasets/config.
Adds workspace multi-turn eval scenario coverage plus focused tests for agentic loops, environment search, response scoping, and stage gates.
Adds generic environment execution/scoring support for configured tool aliases so path scoring can match schema-facing commands without hardcoding a toolset.
Adds env generation diagnostics, SFT prompt alignment migration, local GRPO image, and PEFT merge helper.

Validation:

python -m pytest tests/shared/test_agentic_loop.py tests/shared/test_local_environment_search.py tests/shared/test_workspace_multiturn_scenarios.py tests/synthchat/test_agentic_episode_messages.py tests/synthchat/test_response_scope_message_selection.py tests/synthchat/test_stage_gates.py
python .skills/scripts/sync_skill_trees.py --check
git diff --check

…u, and animation scenes - Implemented `LiveEvaluationDashboard` for real-time evaluation metrics display. - Created `generate_round_flask` function to visually represent a flask shape in terminal. - Developed interactive menu using `asciimatics` with animated branding and options. - Added scene creation functions for logo display, training start splash, and celebration animations.

…ation monitoring - Implemented SynthChatMetrics to track generation progress, including total examples, completed, valid, and invalid counts. - Created ResultEntry class for logging individual results with status, category, and reason. - Developed LiveSynthChatDashboard class for displaying metrics and recent results in a user-friendly format. - Integrated rich console output for enhanced visual representation of progress and results. - Added methods for updating metrics, building display, and handling live updates.

…; update CLI commands and menus

…I UI; add project configuration files

…m overview - Implemented ListHandler to manage 'list' subcommands for datasets, models, runs, rubrics, and scenarios. - Added JSON output support for list commands. - Created StatusHandler to provide an overview of system state, including environment info, CUDA availability, dependencies, and service connectivity. - Enhanced output formatting with rich display options for both handlers.

- Delete dead Evaluator parsers (not imported anywhere) - Evaluator/response_parser.py (614 lines) - Evaluator/tool_call_parser.py (354 lines) - Remove duplicate SynthChat validators (already using shared/validation/) - SynthChat/services/validators/base.py - SynthChat/services/validators/structure_validator.py - SynthChat/services/validators/cross_scope_validator.py - SynthChat/services/validators/content/ (6 files) - Update CLAUDE.md: replace improvement_engine references with SynthChat (improvement_engine/ directory doesn't exist, functionality is in SynthChat) Total: ~1,200 lines of duplicate/dead code removed 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Was looking for Trainers/shared/ui/ but shared UI is at shared/ui/ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

…odel size matching

* Add Claude Code skills for fine-tuning, evaluation, and upload-deployment; rewrite README Create 3 new skills with progressive disclosure architecture (lean SKILL.md + focused reference docs) covering the full pipeline: fine-tuning (SFT/KTO/GRPO, 7 reference docs), evaluation (CLI, scenarios, backends, 5 reference docs), and upload-deployment (GGUF, merging, model cards, 4 reference docs). Update synthetic-data-generation skill with reference docs and helper scripts. Rewrite README as an agentic-first entrypoint with problem/solution framing and progressive disclosure pattern. Co-Authored-By: Claude Sonnet 4.5 <[email protected]> * Add cross-platform AI coding tool compatibility section to README Document how to use the agent skills with Cursor, Windsurf, Cline, Roo Code, Amazon Q, JetBrains AI, Augment, Kilo Code, Tabnine, Zed, GitHub Copilot, and Aider. Include copy commands and platform-specific notes. Broaden framing from Claude Code-only to any AI coding agent. Co-Authored-By: Claude Sonnet 4.5 <[email protected]> * Simplify cross-platform section: add AGENTS.md convention and .skills/ universal folder Fix platform guidance to mention AGENTS.md entrypoint convention, add universal .skills/ folder at project root for Zed/Aider/Copilot/others, and streamline table. Co-Authored-By: Claude Sonnet 4.5 <[email protected]> * feat: Add parallel worker support for docs-based generation Enable --workers flag to parallelize document processing in SynthChat. Previously, parallel workers only applied to non-docs scenarios. Changes: - Add ThreadPoolExecutor for docs when workers > 1 - Preserve sequential behavior when workers == 1 - Reuse existing worker pattern for consistency - Progress reporting works for both parallel and sequential modes Fixes #1 Co-Authored-By: Claude Sonnet 4.5 <[email protected]> * fix: Guard division-by-zero in parallel progress tracking Fixes bug identified in PR review where `completed/total*100` would crash when total==0 (e.g., empty docs list or all scenarios not found). Changes: - Guard division: pct = (completed/total*100) if total > 0 else 0 - Skip ThreadPoolExecutor creation when no work items - User feedback: "No work items to process (check scenario names)" Applied to both parallel paths (docs and non-docs). Co-Authored-By: Claude Sonnet 4.5 <[email protected]> * refactor: Apply 6 PR review improvements 1. DRY violation: Extract shared _run_parallel_generation() helper - Eliminates ~60 lines of duplication between docs/non-docs paths - Encapsulates progress tracking, executor, error handling 2. Input validation: Clamp workers to max(1, args.workers) - Prevents ValueError from --workers 0 or negative values 3. Output ordering: Sort results by task_id to preserve document order - Parallel mode now returns results in input order, not completion order 4. Variable naming: Rename worker_id to task_id throughout - More accurate (it's a task counter, not thread identifier) 5. Private method access: Rename _generate_single to generate_single - Makes method officially public (called from outside class) 6. BaseException handling: Add try/except with executor cleanup - Graceful handling of KeyboardInterrupt, SystemExit Co-Authored-By: Claude Sonnet 4.5 <[email protected]> --------- Co-authored-by: Claude Sonnet 4.5 <[email protected]>

* Add Claude Code skills for fine-tuning, evaluation, and upload-deployment; rewrite README Create 3 new skills with progressive disclosure architecture (lean SKILL.md + focused reference docs) covering the full pipeline: fine-tuning (SFT/KTO/GRPO, 7 reference docs), evaluation (CLI, scenarios, backends, 5 reference docs), and upload-deployment (GGUF, merging, model cards, 4 reference docs). Update synthetic-data-generation skill with reference docs and helper scripts. Rewrite README as an agentic-first entrypoint with problem/solution framing and progressive disclosure pattern. Co-Authored-By: Claude Sonnet 4.5 <[email protected]> * Add cross-platform AI coding tool compatibility section to README Document how to use the agent skills with Cursor, Windsurf, Cline, Roo Code, Amazon Q, JetBrains AI, Augment, Kilo Code, Tabnine, Zed, GitHub Copilot, and Aider. Include copy commands and platform-specific notes. Broaden framing from Claude Code-only to any AI coding agent. Co-Authored-By: Claude Sonnet 4.5 <[email protected]> * Simplify cross-platform section: add AGENTS.md convention and .skills/ universal folder Fix platform guidance to mention AGENTS.md entrypoint convention, add universal .skills/ folder at project root for Zed/Aider/Copilot/others, and streamline table. Co-Authored-By: Claude Sonnet 4.5 <[email protected]> * docs: Pin parallel docs workers feature to Working Memory Added context about PR #55: - --workers N now supports docs-based generation - Architecture: _run_parallel_generation() helper, instance isolation - API change: generate_single() now public - Input validation: clamps workers to max(1, value) Co-Authored-By: Claude Sonnet 4.5 <[email protected]> * gitignore * chore: Add reference skills and clean up old PACT files Add comprehensive skill documentation: - evaluation: Model testing and validation - fine-tuning: SFT, KTO, GRPO training guides - synthetic-data-generation: Dataset generation and improvement - upload-deployment: Model upload and GGUF conversion Clean up old PACT files now managed by plugin: - Remove .claude/agents/*.md (now in plugin) - Remove .claude/commands/PACT/*.md (now in plugin) - Remove .claude/hooks/*.py (now in plugin) - Remove .claude/protocols/*.md (now in plugin) - Remove .claude/skills/pact-* (now in plugin) - Remove .claude/skills/n8n-* (now in plugin) - Update .claude/settings.json to remove hook references Other changes: - Update CLAUDE.md test output location - Update SynthChat README and docs_loader - Add new documentation files Co-Authored-By: Claude Sonnet 4.5 <[email protected]> --------- Co-authored-by: Claude Sonnet 4.5 <[email protected]>

* Add Claude Code skills for fine-tuning, evaluation, and upload-deployment; rewrite README Create 3 new skills with progressive disclosure architecture (lean SKILL.md + focused reference docs) covering the full pipeline: fine-tuning (SFT/KTO/GRPO, 7 reference docs), evaluation (CLI, scenarios, backends, 5 reference docs), and upload-deployment (GGUF, merging, model cards, 4 reference docs). Update synthetic-data-generation skill with reference docs and helper scripts. Rewrite README as an agentic-first entrypoint with problem/solution framing and progressive disclosure pattern. Co-Authored-By: Claude Sonnet 4.5 <[email protected]> * Add cross-platform AI coding tool compatibility section to README Document how to use the agent skills with Cursor, Windsurf, Cline, Roo Code, Amazon Q, JetBrains AI, Augment, Kilo Code, Tabnine, Zed, GitHub Copilot, and Aider. Include copy commands and platform-specific notes. Broaden framing from Claude Code-only to any AI coding agent. Co-Authored-By: Claude Sonnet 4.5 <[email protected]> * Simplify cross-platform section: add AGENTS.md convention and .skills/ universal folder Fix platform guidance to mention AGENTS.md entrypoint convention, add universal .skills/ folder at project root for Zed/Aider/Copilot/others, and streamline table. Co-Authored-By: Claude Sonnet 4.5 <[email protected]> * docs: Pin parallel docs workers feature to Working Memory Added context about PR #55: - --workers N now supports docs-based generation - Architecture: _run_parallel_generation() helper, instance isolation - API change: generate_single() now public - Input validation: clamps workers to max(1, value) Co-Authored-By: Claude Sonnet 4.5 <[email protected]> * feat: Add live streaming to SynthChat result writing Write generated examples to output file immediately as they complete instead of batching in memory. Prevents data loss on process crashes. Changes: - Add StreamingResultWriter class (context manager) - Thread-safe writes via threading.Lock for parallel mode - Metadata header written at generation start - All code paths (docs, parallel, sequential) stream results - Keep _save_results() for potential future use - Fix datetime.utcnow() deprecation in new code Files: SynthChat/run.py (+147, -69 lines) Co-Authored-By: Claude Sonnet 4.5 <[email protected]> --------- Co-authored-by: Claude Sonnet 4.5 <[email protected]>

commit 8d64dd5 Merge: a80e883 31da4da Author: ProfSynapse <[email protected]> Date: Sat Feb 14 10:36:50 2026 -0500 Merge main into feat/live-streaming-results Resolves merge conflict in SynthChat/run.py by combining: - Live streaming results (StreamingResultWriter) from this branch - Parallel docs-based generation and _run_parallel_generation() DRY helper from main (PR #55) Key integration decisions: - _run_parallel_generation() accepts optional `writer` param for streaming - All 4 code paths stream to disk: parallel docs, sequential docs, parallel non-docs, sequential non-docs - Uses generate_single() public API and 3-tuple returns from main - Preserves graceful shutdown on interrupts from main Co-Authored-By: Claude Opus 4.6 <[email protected]> commit a80e883 Author: ProfSynapse <[email protected]> Date: Sat Feb 14 10:25:04 2026 -0500 feat: Add live streaming to SynthChat result writing Write generated examples to output file immediately as they complete instead of batching in memory. Prevents data loss on process crashes. Changes: - Add StreamingResultWriter class (context manager) - Thread-safe writes via threading.Lock for parallel mode - Metadata header written at generation start - All code paths (docs, parallel, sequential) stream results - Keep _save_results() for potential future use - Fix datetime.utcnow() deprecation in new code Files: SynthChat/run.py (+147, -69 lines) Co-Authored-By: Claude Sonnet 4.5 <[email protected]> commit 91c5b22 Author: ProfSynapse <[email protected]> Date: Sat Feb 14 10:09:56 2026 -0500 docs: Pin parallel docs workers feature to Working Memory Added context about PR #55: - --workers N now supports docs-based generation - Architecture: _run_parallel_generation() helper, instance isolation - API change: generate_single() now public - Input validation: clamps workers to max(1, value) Co-Authored-By: Claude Sonnet 4.5 <[email protected]> commit 553fd38 Author: ProfSynapse <[email protected]> Date: Sat Feb 14 09:25:27 2026 -0500 Simplify cross-platform section: add AGENTS.md convention and .skills/ universal folder Fix platform guidance to mention AGENTS.md entrypoint convention, add universal .skills/ folder at project root for Zed/Aider/Copilot/others, and streamline table. Co-Authored-By: Claude Sonnet 4.5 <[email protected]> commit ef394f4 Author: ProfSynapse <[email protected]> Date: Sat Feb 14 09:21:43 2026 -0500 Add cross-platform AI coding tool compatibility section to README Document how to use the agent skills with Cursor, Windsurf, Cline, Roo Code, Amazon Q, JetBrains AI, Augment, Kilo Code, Tabnine, Zed, GitHub Copilot, and Aider. Include copy commands and platform-specific notes. Broaden framing from Claude Code-only to any AI coding agent. Co-Authored-By: Claude Sonnet 4.5 <[email protected]> commit 92c4863 Author: ProfSynapse <[email protected]> Date: Sat Feb 14 09:11:42 2026 -0500 Add Claude Code skills for fine-tuning, evaluation, and upload-deployment; rewrite README Create 3 new skills with progressive disclosure architecture (lean SKILL.md + focused reference docs) covering the full pipeline: fine-tuning (SFT/KTO/GRPO, 7 reference docs), evaluation (CLI, scenarios, backends, 5 reference docs), and upload-deployment (GGUF, merging, model cards, 4 reference docs). Update synthetic-data-generation skill with reference docs and helper scripts. Rewrite README as an agentic-first entrypoint with problem/solution framing and progressive disclosure pattern. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

- Increase MAX_RETRIES from 3 to 6 for better handling of transient OpenRouter rate limits - Add automatic fallback provider chain: OpenRouter → LMStudio → Ollama - Rewrite _call_with_retry() to iterate through providers with exponential backoff per provider - Add _switch_to_fallback_provider() helper that swaps llm_client in engine and services - Fallback client creation uses environment variables (same as primary via LLMConfig.from_env()) - Skip unavailable fallback providers with warning (e.g., missing API key, connection refused) Context: Recent 96-essay generation hit OpenRouter rate limits with 14 failures. With 50 workers: reduced to 5 failures. Increased retries + fallback ensures generation doesn't fail even if OpenRouter is completely down. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

- Added `fixture_parser.py` to parse workspace fixture information from system prompts, defining the `EnvironmentFixture` class and related functions. - Introduced `local_runtime.py` for a filesystem-backed runtime using a temporary local directory, implementing methods for directory and file operations. - Created `tool_executor.py` to execute parsed tool calls against the environment runtime, supporting various actions and schema-driven execution. - Defined data types in `types.py` for environment validation, including `EnvironmentIssue`, `ExecutedToolCall`, and `EnvironmentValidationResult`. - Developed `validator.py` for high-level environment validation, executing tool calls and validating state assertions against runtime. - Integrated YAML configuration loading for tool schemas and execution settings.

Creates comprehensive essay writing training data from Meditations on Alignment essays. Dataset Structure: - 192 total training examples (96 essays × 2 conversation pairs) - Pair 1: User brainstorm → Assistant structured outline - Pair 2: User feedback → Assistant full essay with frontmatter Generation Pipeline: 1. Extracted outlines from original essays 2. Paired outlines with cleaned essays (removed dataview blocks, Obsidian syntax) 3. Generated synthetic user feedback using SynthChat scenario 4. Split into 2-turn conversation pairs for SFT training Token Distribution: - 93% under 4K tokens - Avg pair 1: 905 tokens (brainstorm → outline) - Avg pair 2: 2,926 tokens (feedback → essay) Files: - Datasets/essay_datasets/essay_2turn_pairs.jsonl (final dataset) - SynthChat/scenarios/essay_feedback.yaml (feedback generation scenario) - scratch/essay_dataset/*.py (processing scripts) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

- Fix config.yaml for LiquidAI/LFM2.5-1.2B-Instruct: set load_in_4bit: false (LIV conv blocks incompatible with bnb-4bit), correct LoRA target_modules (out_proj/in_proj/w1/w2/w3), r=16/alpha=16/dropout=0, linear scheduler, warmup_ratio=0.02, batch_size=2, max_seq=4096 - Add validate_model_compatibility() to train_sft.py: extensible MODEL_COMPATIBILITY_RULES registry detects LFM2-family models at startup and warns if load_in_4bit=true or wrong LoRA target_modules are configured — runs before model load so crash is prevented - Update fine-tuning skill docs (model-presets.md, training-config.md, troubleshooting.md) with LFM2.5 architecture-specific overrides and SIGABRT/exit-code-6 troubleshooting entry Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

…unPod - Add CloudTrainHandler with provider selection menu, cost estimates, and graceful degradation for uninstalled SDKs - Add HFJobsBackend with UV/PEP 723 script wrapping (uses existing HF_TOKEN) - Add ModalBackend with OAuth flow + dual volume caching for model weights - Add RunPodBackend with pod lifecycle management and always-terminate safety - Add base_cloud.py: shared helpers (load_cloud_config, poll_until_done, GPU pricing) - Add cloud_config.yaml: budget/standard/performance GPU tiers across all providers - Add Trainers/cloud/: modal_train.py standalone app and runpod_sync.py utilities - Add requirements-cloud.txt for optional cloud dependencies - Wire cloud command into tuner CLI and main menu All three backends conditionally register — local-only users unaffected. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

Security: - Replace GH_TOKEN value embedding with $GH_TOKEN shell var in RunPod startup command and runpod_sync.py clone URL (credential leak fix) - Scrub credential URLs from Modal git clone stderr before logging API correctness: - Rewrite HF Jobs execute() to use correct run_job(image, command, flavor) signature; replace _build_uv_script() with _build_training_command() - Fix job.id and status_obj.stage attribute access for HF Jobs API Billing safety: - Add ERROR state detection in RunPod _poll_training (prevents 6hr timeout) - Add 3-attempt retry with backoff in _terminate_pod (prevents billing leak) - poll_until_done now raises immediately on persistent errors (auth/not-found) - Add timeout to Modal subprocess.Popen.wait() with graceful kill Type correctness: - modal_backend.load_config() now returns CloudTrainingConfig - runpod_backend.load_config() now returns CloudTrainingConfig Other: - train_modal.py: .options() → .with_options() for Modal >= 0.73.0 - modal_backend: use shared load_cloud_config() instead of inline YAML parse - cloud_config.yaml: default cloud_type COMMUNITY → SECURE (preemption safety) - requirements-cloud.txt: add python-dotenv Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

…al subprocess timeout - base_cloud.py: Add consecutive-error counter (max 3) to poll_until_done. Persistent errors (unauthorized, not found, forbidden, invalid) raise immediately. Too many consecutive transient errors also raise instead of polling silently for hours. - modal_backend.py: Add timeout to subprocess.Popen.wait() based on timeout_hours from cloud config. Kills process on TimeoutExpired. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

- runpod_backend: properly add ERROR+FAILED to _poll_training terminal states (prevents 6hr timeout on failed pods) - runpod_backend: replace single-attempt terminate with 3-attempt retry loop with exponential backoff (1s, 2s, 4s) and CRITICAL log on failure - train_modal.py: confirm stderr URL scrubbing present (re.sub redaction) poll_until_done circuit breaker and Modal subprocess timeout were confirmed present in prior commit b198c9f. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

…oud backends 122 tests, 89% overall coverage (96-100% on critical billing safety paths). Tests cover poll_until_done circuit breaker, RunPod pod lifecycle + terminate retry, ERROR/FAILED state detection, Modal subprocess timeout, HF Jobs API, and GH_TOKEN credential isolation. Run: python -m pytest tests/cloud/ -v Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

- Consolidate GPU pricing into cloud_config.yaml (single source of truth) - modal_backend: use shared load_cloud_config() instead of inline YAML parse - train_modal: .options() → .with_options() for Modal >= 0.73.0 API - runpod_sync: remove duplication with base_cloud, import shared helpers - CloudTrainHandler: wire gpu_tiers from cloud_config.yaml (dynamic, not hardcoded) - build_training_startup_command: validate method param (sft/kto only) - runpod validate_environment: stronger RUNPOD_API_KEY format check - runpod/sync: conditional $GH_TOKEN@ injection (only when GH_TOKEN is set) - train_modal: Secret.from_dict() with explicit keys instead of from_dotenv() Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

…g, type fixes, validation 1. Move GPU pricing from hardcoded Python dict to cloud_config.yaml (DRY) 2. Modal Secret.from_dict for env vars instead of individual from_name calls 3. Type annotation fixes (Optional return types, str hints) 4. runpod_sync imports resolve_repo_url from base_cloud (eliminates duplicate) 5. CloudTrainHandler reads method labels and gpu_tiers from YAML 6. Method validation guard in build_training_startup_command 7. RunPod API key validation: 32+ chars with alpha check 8. GH_TOKEN clone guard: only inject when token is actually set 9. Modal stderr capture via subprocess.PIPE Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

* feat(evaluator): add optional LLM-as-judge to evaluation pipeline Introduces a reusable shared/judge/ module and integrates LLM-as-judge scoring into the Evaluator alongside existing pattern matching. The judge can be composed with pattern matching using AND, OR, or judge_only modes. New: shared/judge/ module (generic, reusable) - models.py: RubricDef, JudgeScore, JudgeResult, JudgeConfig dataclasses - rubric_loader.py: Load YAML rubric files -> RubricDef instances - schema_builder.py: Merge rubric output_schemas into combined JSON schema - judge_service.py: Execute LLM judge via BaseLLMClient.structured_output() - interaction_logger.py: Thread-safe JSONL logging for KTO training New: Evaluator/judge_validator.py - JudgeValidator: renders evaluator-specific template vars, calls JudgeService - JudgeValidationResult: result dataclass with judge_mode New: Evaluator/config/rubrics/ - tool_call_quality.yaml: judges tool selection and argument correctness - response_appropriateness.yaml: judges response clarity and helpfulness Modified: Evaluator integration - runner.py: EvaluationRecord.judge field, AND/OR/judge_only status logic, AND optimization (skip judge call if pattern match already fails) - config.py: EvalJudgeConfig dataclass attached to EvaluatorConfig - reporting.py: judge stats in console/markdown/JSON output Note: Evaluator/cli.py (--judge* flags) committed separately due to pre-existing false positive in hook security scan (line 625). 72/72 unit tests pass. Architecture doc: docs/architecture/llm-judge-integration.md Co-Authored-By: Claude Sonnet 4.6 <[email protected]> * feat(evaluator): implement LLM-as-judge evaluation with configurable options * fix(cli): clarify HuggingFace authentication message for better user guidance --------- Co-authored-by: Claude Sonnet 4.6 <[email protected]>

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

…tifact paths

- Implement `filter_lora_adapter.py` to filter LoRA adapter directories based on tensor key substrings. - Create `manage_space.py` for rendering, deploying, and managing Hugging Face Spaces with support for various configurations. - Add Dockerfile and README template for the `vllm_warm` space, including entrypoint and sync script for adapter management. - Introduce `manual_lora_merge.py` for merging LoRA deltas into models locally. - Add tests for filtering LoRA adapters and managing spaces, ensuring functionality and correctness.

- Introduced `06_migrate_cli_schema_datasets.py` to migrate non-thinking datasets to the CLI-oriented schema, including argument normalization and command rendering. - Added `08_inventory_synthchat_cli_schema_refs.py` to inventory SynthChat configuration references needing CLI-schema alignment, with special pattern detection. - Created `cli_schema_rules.py` for classification rules used in the migration pipeline, defining in-scope agents and tool classifications. - Implemented `cli_schema_utils.py` with shared helper functions for dataset migration, including loading catalogs, validating call shapes, and rendering CLI commands.

Align SynthChat and Evaluator with config-driven CLI formats

…tool-datasets Regenerate non-thinking tool datasets and merge latest outputs

- Introduced a new YAML configuration file for SFT training of the Qwen 3.5 9B model. - Set up dataset source and training parameters including batch size, learning rate, and LoRA settings. - Disabled evaluation and loss tracking for initial training phase. - Enabled feature formats for output data.

…ontainer mode Adds a new `local-run` command that runs SFT/KTO inside Docker on a local GPU without the usual UID/GID permission headaches, with the asciimatics dashboard visible inside the container, and an optional persistent-container mode that caches pip installs, HuggingFace model downloads, and triton compile output across repeat runs. Three workstreams, all config-driven via `Trainers/local/jobs/*.yaml`: 1. **UID-agnostic** — defaults to `-u 0:0` inside the container with a bash trap that chowns artifacts back to the host uid/gid on EXIT. Handles bind and copy transfer modes; detects WSL drvfs and warns when POSIX metadata isn't enabled. 2. **TTY-aware** — allocates `-i -t` when stdout is a tty so the asciimatics dashboard renders inside the container. `job.tty: auto|always|never`. 3. **Persistent container** — `job.persist: true` keeps a long-lived container per job config so repeat runs skip pip install + model download. Uses `--init` (tini) for clean ctrl-C signal propagation, plus HF and pip cache mounts. Managed via `--stop`, `--rm-persistent`, `--container-status`. Also rolls up a small cleanup of stale `Trainers/rtx3090_sft` / `rtx3090_kto` references in docstrings and docs to match the actual `Trainers/sft` / `Trainers/kto` layout, and removes the fully orphaned `docs/prep/local-training/rtx3090-kto-finetuning.md`. Reference: `Trainers/local/jobs/qwen35_2b_sft_smoke.yaml` (fast smoke config) and `qwen35_2b_sft_2epoch.yaml` (full 2-epoch SFT). Config keys documented in `.skills/fine-tuning/reference/training-config.md`. Troubleshooting in `docs/troubleshooting.md`. Tests: 148 unit tests covering uid-agnostic helpers, TTY resolution, persistent container lifecycle, cache mounts, and pip marker-hash guard. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

feat(local-run): uid-agnostic local Docker training + persistent-container mode

…ndings Skill content updates (all three synced trees: .skills, .agents/skills, .claude/skills): - `fine-tuning/SKILL.md` gains a "Local Docker config run" row and a preference note for `python tuner.py local-run --job-config <yaml>` over ad-hoc `docker run` for repeatable local GPU training; adds `Trainers/local/jobs/` + `Trainers/archive/legacy_rtx3090/` to the directory map; records the 2026-04-22 `unsloth/unsloth:latest` digest + transformers pin guidance for Qwen3.5. - `fine-tuning/reference/sft-training.md` gains a config-driven local Docker SFT smoke-run example pointing at `Trainers/local/jobs/qwen35_2b_sft_smoke.yaml`. - `fine-tuning/reference/grpo-training.md`, `upload-deployment/*` — small path updates (`rtx3090_sft` / `kto_output_rtx3090` -> `sft` / `kto_output`) to match the actual `Trainers/` layout. Docs path cleanup: - QUICKSTART, INSTALLATION_GUIDE, PROJECT_OVERVIEW, EVOLUTIONARY_FINETUNING, SYNTH_CHAT_*, NEBIUS_*, ml-pipeline-*, etc. — stale `rtx3090_sft` / `rtx3090_kto` / `kto_output_rtx3090` references swapped for the canonical `sft` / `kto` / `kto_output` paths. Line-ending hygiene: - Many docs + skill mirrors were stored in HEAD with CRLF. Working tree has been LF for a while; this commit normalizes them in the index so the working tree and repo agree. `.gitattributes` already specified `* text=auto` — this just cleans up files that were committed pre-normalization. Verified: `python3 .skills/scripts/sync_skill_trees.py --check` reports "Skill trees are in sync." Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

chore(skills+docs): document local-run in skills, normalize LF line endings

- Added PrivacyPreprocessor for orchestrating privacy-related text processing. - Introduced Pseudonymizer for replacing sensitive information with synthetic values. - Created a smoke runbook for testing privacy features with synthetic data. - Developed unit tests for privacy preprocessing and pseudonymization functionalities. - Included sample privacy fixture documents and JSONL datasets for testing. - Enhanced existing services to integrate privacy preprocessing and pseudonymization.

…#85) * fix(trainers): epoch counter in TUI dashboard + DRY callback refactor Two related changes bundled: 1. Fix: HuggingFace Trainer emits `logs['epoch']` as a float; three callback classes were casting it to int before `dashboard.update(epoch=...)`, truncating sub-epoch progress to 0 until each full epoch completed. Cast is now `float(...)`, matching the dashboard's internal type and the JSONL log-reader path. 2. Refactor: the duplication surfaced by the fix (identical bug in three places) motivated a DRY pass. Extracted a shared `Trainers/shared/ callbacks/` package — BaseMetricsCallback + BaseLiveDashboardCallback, HealthChecker strategy with SFT/KTO/NoOp concrete subclasses, hoisted TwoStageLRCallback and CheckpointMonitorCallback. Per-trainer `training_callbacks.py` files reduced to thin subclassing shims that re-export the same public symbols at the same paths — zero caller edits. Net LOC: -383 overall (callback files -1066, shared package +683). 21 new unit tests (tests/trainers/test_callback_refactor.py) cover the four HIGH/MEDIUM uncertainties from the architect + coder HANDOFFs. Intentional additive behavior change: KTO and GRPO now gain env-fallback cloud-provider resolution (CLOUD_PROVIDER env var takes precedence over getattr(args, "cloud_provider")). SFT behavior unchanged. See design doc §6 risk matrix at docs/architecture/training-callbacks-refactor.md. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * fix(trainers): remediate B1+B2 from PR #85 review — SFT cadence + GRPO dict-merge Addresses two blocking behavior regressions surfaced by the architect + backend-coder review of the callback refactor: B1 — SFT cadence drift: health_checker.check() and last_log_time update were firing every on_log instead of only at interval-gated steps for SFT. Pre-refactor SFT early-returned on the modulo before both. Fixed via two new class attrs on BaseMetricsCallback: health_check_every_on_log: bool = True # KTO/GRPO default interval_time_updates_every_on_log: bool = True # KTO/GRPO default SFT's MetricsTableCallback overrides both to False — the two lines now only run inside the should_write_jsonl branch for SFT. KTO/GRPO unchanged (both originals called health-check and updated last_log_time every on_log). B2 — GRPO dict-merge precedence: original GRPO built the JSONL row with our-fields-win semantics (`entry = dict(logs); entry[k]=v; entry.update(cap)`); SFT and KTO originals used logs-win (`{**our_fields, **capacity, **logs}`). New base was uniformly logs-win, silently flipping GRPO. Fixed via per-trainer class attr: fields_win_on_collision: bool = False # SFT/KTO default GRPO's LiveDashboardCallback overrides to True. _write_log_row branches on the attr to emit the correct spread order per trainer. All three trainers now preserve pre-refactor behavior byte-exact for key collisions (no current HF log key collides with our fields in default training, so practical impact is zero, but the refactor must not silently change semantics). Design doc §6a + §6b updated with correction notes acknowledging the earlier "unify on SFT/KTO style" HANDOFF was based on a misread of the pre-refactor GRPO build order. Test updates: TestDictMergeOrder split into per-trainer assertions (GRPO fields-win, KTO logs-win). All 22 tests green. Review reports included under docs/review/pr-85-*.md. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * refactor(trainers): PR #85 minor/future remediation (code + tests) Code (backend-coder-reviewer): - M-A: drop duplicate banner line in BaseMetricsCallback.on_train_end - M-D: add log_write_swallow_errors to BaseLiveDashboardCallback JSONL write - M-E: rename per-trainer module docstrings ('shims' -> concrete subclasses) - M-F: consolidate sys.path.insert into Trainers/shared/callbacks/__init__.py - M-G: document _annotate_cloud dual-call-site contract - M-H: document total_epochs=1 sentinel - F-E: remove no-op CheckpointMonitorCallback.on_save - F-F: cosmetic cleanup (redundant NoOpHealthChecker assignment, return/pass) Tests (test-engineer): - M-J: _dashboard_metrics fallback coverage (KTO kl, GRPO reward chain) - F-A: HealthChecker output-format snapshot tests - F-B: capture_runtime_capacity_snapshot GPU-branch tests - F-C: suppress_training_logs context-manager tests - F-D: SFT JSONL shape-parity baseline (strip-list + type-stability) 52/52 test_callback_refactor.py pass. All post-review minor/future items addressed per user decisions from peer-review workflow. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> --------- Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>

…-option feat: add privacy preprocess and OPF integration

[codex] Make evaluator assertions config driven

Consolidate training configs from Trainers/local/jobs/ and Trainers/cloud/jobs/ into a single Trainers/recipes/ directory. - Add tuner/discovery/recipes.py with RecipeMeta, list_recipes(), and load_recipe() (supports target:both deep-merge) - Migrate 16 recipe YAMLs via git mv, adding target: and method: fields - Update local_run_handler and cloud_run_handler to use discovery module - Add local-run and cloud-run entries to TUI main menu - Update 27 references across skills, docs, READMEs, and tests - Fix stale hf_jobs_hardware.py reference in cloud-training.md - Sync skill mirrors via sync_skill_trees.py Co-Authored-By: Claude Opus 4.6 <[email protected]>

- Add tests/discovery/test_recipes.py with 34 tests covering: list_recipes filtering (target, method, combined), robustness (malformed YAML, missing fields, non-dict files), load_recipe deep-merge (sub-block precedence, list replacement, nesting), handler integration (path verification), reference completeness - Fix stale CLI help text in parser.py:119-120 referencing old Trainers/local/jobs/ and Trainers/cloud/jobs/ paths Co-Authored-By: Claude Opus 4.6 <[email protected]>

Move `from huggingface_hub import sync_bucket` from top-level to inside `pull_artifacts()` where it's actually used. The top-level import killed the entire CLI (TUI, local-run, cloud-run) in conda environments where huggingface_hub doesn't export sync_bucket (e.g., unsloth_latest with huggingface_hub 0.36.0). Co-Authored-By: Claude Opus 4.6 <[email protected]>

fix: lazy-import sync_bucket to unblock CLI

feat: unified training recipe system

Snapshot of in-progress work prior to merging origin/main (recipe system). To be reorganized into proper commits later. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…ven-assertions # Conflicts: # .agents/skills/fine-tuning/SKILL.md # .claude/skills/fine-tuning/SKILL.md # .skills/fine-tuning/SKILL.md # .skills/fine-tuning/reference/cloud-training.md

ProfSynapse and others added 30 commits January 2, 2026 12:24

feat: implement SynthChat handler for data generation and improvement…

3a9cd88

…; update CLI commands and menus

feat: enhance SynthChat handler with new functionality and improve CL…

5e722c2

…I UI; add project configuration files

fix: correct shared UI import path in tuner/ui/menu.py

98a030d

Was looking for Trainers/shared/ui/ but shared UI is at shared/ui/ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

feat: update response parser imports and enhance WebLLM handler for m…

66cf01f

…odel size matching

added pact

fedbcd6

chore: remove PACT-prompt subproject reference

6567e00

updated animation to not break on resize

f057ba2

Refactor code structure for improved readability and maintainability

b977f98

docs(claude): pin pre-commit hook gotcha and shared/judge/ module

3838645

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

fix: update .gitignore to include additional archive and generated ar…

79bde79

…tifact paths

ProfSynapse and others added 29 commits April 21, 2026 07:13

Align SynthChat and Evaluator with config-driven CLI formats

c02c996

Regenerate non-thinking tool datasets and merge latest outputs

94bd592

Merge pull request #82 from ProfSynapse/codex/cli-tool-format-alignment

9ab97d3

Align SynthChat and Evaluator with config-driven CLI formats

Merge pull request #81 from ProfSynapse/codex/regenerate-nonthinking-…

cb7181f

…tool-datasets Regenerate non-thinking tool datasets and merge latest outputs

Merge pull request #83 from ProfSynapse/fix/local-run-uid-agnostic

421ecef

feat(local-run): uid-agnostic local Docker training + persistent-container mode

Merge pull request #84 from ProfSynapse/chore/skill-docs-sync-local-run

766c486

chore(skills+docs): document local-run in skills, normalize LF line endings

feat(synthchat): add implementation plan for privacy seed sanitization

ef2cc56

docs(skills): document privacy runtime setup

7bf54d3

Merge pull request #86 from ProfSynapse/feat/integrate-privacy-filter…

a856433

…-option feat: add privacy preprocess and OPF integration

Make evaluator assertions config driven

14e3bce

Merge PR #87: Make evaluator assertions config driven

47d7143

[codex] Make evaluator assertions config driven

Add LLM-judge reward to GRPO (#88)

2998148

Add workspace multistep GRPO smoke config

303e0c0

Add workspace multistep GRPO smoke config (#89)

931809a

Merge pull request #90 from ProfSynapse/fix/lazy-sync-bucket

0c673f7

fix: lazy-import sync_bucket to unblock CLI

Merge pull request #91 from ProfSynapse/feat/unified-training-recipes

99b9804

feat: unified training recipe system

WIP: skills, GRPO env training, eval/synthchat refinements

9deb94d

Snapshot of in-progress work prior to merging origin/main (recipe system). To be reorganized into proper commits later. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Merge remote-tracking branch 'origin/main' into codex/eval-config-dri…

fc91e6d

…ven-assertions # Conflicts: # .agents/skills/fine-tuning/SKILL.md # .claude/skills/fine-tuning/SKILL.md # .skills/fine-tuning/SKILL.md # .skills/fine-tuning/reference/cloud-training.md

Checkpoint env GRPO data and eval cleanup

a846e33

ProfSynapse force-pushed the main branch from 5824c4f to df8de53 Compare June 22, 2026 20:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checkpoint config-driven env GRPO eval work#92

Checkpoint config-driven env GRPO eval work#92
ProfSynapse wants to merge 791 commits into
mainfrom
codex/eval-config-driven-assertions

ProfSynapse commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ProfSynapse commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants