Skip to content

CI: data flywheel (fork verification)#1

Open
smowtion wants to merge 10 commits into
mainfrom
feat/data-flywheel
Open

CI: data flywheel (fork verification)#1
smowtion wants to merge 10 commits into
mainfrom
feat/data-flywheel

Conversation

@smowtion

Copy link
Copy Markdown
Owner

Intra-fork PR to run CI (lint / test 3.10-3.12 / typecheck / build + wheel-exclusion assertion) on smowtion's fork. Mirror of upstream PR DannyLuna17#8.

smowtion added 10 commits June 13, 2026 23:03
Add opt-in active-learning collector to capture uncertain/failed samples:
- New collection/ module with DataCollector (writes tiles/failures to collect_dir)
- SolverConfig.collect_data flag (default False, zero I/O overhead)
- Integration in RecaptchaSolver and AsyncRecaptchaSolver solve() pipelines
- Collector records uncertain tiles (confidence between thresholds) and failures
- tests/ scaffold: test_collector_scaffold.py, test_data_collector.py
- CI build job asserts the wheel ships collection/ but not training/ tooling
- Document the 4-phase data flywheel implementation + the cv2.error/OSError lesson
…ll 4x4 classification fallback

- Add YOLODetector.is_supported() to classify challenges into supported (COCO) vs. unsupported (fallback-capable) groups; skip unsupported challenges with minimal latency using short _reload_challenge delay
- Refactor solve loop (sync+async symmetric) with separate attempts/skips budgets and solved flag; on classification failure, drop to short token-wait instead of hanging full timeout
- Remove _get_target_class; move logic inline to clarify control flow
- Implement per-cell 4x4 classification fallback in SquareCaptchaHandler: when COCO model lacks target (e.g., stairs, bridges), classify each cell independently using the 57k classification model as a last resort
- Add test_is_supported.py (device/model coverage); test_square_handler_fallback.py (cell classification, missing-class recovery)
- Add integration test retry-until-solvable (bounded N=3, wall-clock 180s, still asserts token); now PASSES in 279s (vs. previous fail after 578s)
- Update codebase-summary.md with solve robustness + square handler fallback notes
- Fix test_class_mapping.py SIM300 ruff violation
- Add executed plan + brainstorm report to plans/

Verified: 117 unit tests pass; integration test now stable. Sync+async symmetric; public API unchanged.
Implement full-image collection, cell-bbox annotation, dataset builder, and
3-tier 4x4 detection priority (COCO → custom detection → per-cell fallback).
Phase 3 (GPU training) deferred. Custom model gated behind optional config;
default off preserves existing behavior.

- DataCollector.record_challenge_image: save 4x4 full images to collected/full/
- SquareCaptchaHandler: hook full-image collection (detector.collector injection)
- YOLODetector: load/verify custom detection model (SHA256), detection+mapping
- SolverConfig: custom_detection_model_path parameter
- annotate_detection_cli.py: interactive cell→bbox annotation CLI
- prepare_detection_dataset.py: YOLO detection dataset builder (images/labels/data.yaml)
- CUSTOM_DETECTION_CLASSES/MAPPINGS: runtime detection class sync
- DETECTION_CLASSES: training/class_mapping dataset builder class registry
- 3-tier priority solve handler with per-cell fallback
- 13 new tests; ruff/mypy clean; 138 tests passing
- docs/training-and-flywheel.md: Tier B documentation

Public API unchanged. No confidential data or credentials.
- training/device_utils.resolve_device: CUDA > MPS > CPU auto-detect
- train.py: --device defaults to auto, add --amp/--no-amp (use --no-amp on flaky MPS)
- Apple Silicon (M-series) trains via Metal/MPS; no cloud GPU required for small datasets
- docs + Tier B plan notes: Mac MPS option vs Colab; train_detection.py to reuse device_utils
- train_detection.py: YOLO detect trainer (auto CUDA/MPS/CPU, --amp/--no-amp, --resume)
- write_model_card.py: model_card.json sidecar (date/task/classes/sha256)
- collect.py: loop driver to accumulate per-cell + full-4x4 data with progress counts
- class_mapping: enforce DETECTION_CLASSES == types.CUSTOM_DETECTION_CLASSES contract
- pyproject: add onnx/onnxslim to dev extras (export-time only; runtime keeps onnxruntime)
- smoke-verified train_detection on MPS (synthetic data -> best.pt, task=detect)

Real model still needs human-annotated 4x4 data (no auto pseudo-label).
- auto_annotate_capmonster.py: label collected 4x4 images via CapMonster image mode
  (ComplexImageTask/recaptcha, ~$0.04/1000) -> annotations.jsonl, 0->1-indexed cells
- only the 7 detection classes labeled; resilient per-image (network errors skipped)
- API key via --api-key or env CAPMONSTER_API_KEY (no secret in repo)
- docs: manual vs auto annotation; cap.guru can't do image mode (token only)
solution.answer is a 16-element bool mask (True=cell has target), not a cell-index
list. Previous code coerced bools to ints producing wrong cells ([1,2] for all).
Now map True flags -> 0-indexed cells. Verified end-to-end: collected images ->
auto-labels -> YOLO detection dataset. Caught by spot-checking identical outputs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant