feat: implement word-click solver using YOLOv8 and Siamese similarity by lifefloating · Pull Request #3 · lifefloating/crackTCaptcha

lifefloating · 2026-04-24T07:06:13Z

Replaced the LLM vision path with a faster YOLOv8 detection and Siamese similarity matching for character recognition in word-click challenges.
Introduced a new HTTP server for handling solve requests, improving usability for integrations and concurrent solves.
Added ONNX models for YOLOv8 detection and Siamese matching, along with a bundled font for character rendering.
Implemented a fallback mechanism to legacy ddddocr detection when the primary path is unavailable.
Enhanced dependency management for ONNX Runtime and OpenCV, ensuring compatibility across Python versions.

Summary by Sourcery

将基于 LLM 的 word_click 求解器替换为本地 YOLOv8 + Siamese ONNX 流水线，并新增一个长生命周期的 HTTP 服务器以复用已加载的模型。

New Features:

引入基于打包的 ONNX 模型和字体资源的 YOLOv8 检测 + Siamese 相似度求解器，用于处理 word_click 验证挑战。
新增长生命周期 HTTP 服务器，提供 /solve 和 /health 端点，以支持通过 HTTP 并发、低延迟地进行求解。
提供专用的 word-click 可选依赖 extra，用于为新的求解路径安装 ONNX Runtime、OpenCV 和 ddddocr。

Enhancements:

优化 word_click 流水线，优先使用本地 Siamese 路径；当模型或依赖缺失，或检测失败时，回退到旧的基于 ddddocr 的实现。
添加 ONNX Runtime 执行提供者（execution provider）选择工具和后台预热逻辑，以优化模型加载和推理性能。
更新 CLI、架构文档和 agent 指南，说明新的求解器架构、服务模式以及相关配置选项。
调整打包配置，强制将 ONNX 模型和字体文件包含进 wheels 和 sdists，确保本地模型即装即用。

Build:

扩展 pyproject 中的可选依赖，新增 word-click extra，并通过 hatch 配置确保 ONNX 模型和字体资源被包含在构建产物中。

Documentation:

更新 README 和文档，说明基于 YOLO+Siamese 的 word_click 实现、HTTP 服务模式、配置环境变量以及更新后的依赖 extras。

Original summary in English

Summary by Sourcery

Replace the word_click LLM-based solver with a local YOLOv8 + Siamese ONNX pipeline and add a long-running HTTP server for reuse of loaded models.

New Features:

Introduce a YOLOv8 detection + Siamese similarity solver for word_click challenges using bundled ONNX models and font assets.
Add a long-running HTTP server with /solve and /health endpoints to support concurrent, low-latency solving via HTTP.
Provide a dedicated word-click optional dependency extra that installs ONNX Runtime, OpenCV, and ddddocr for the new solver path.

Enhancements:

Refine the word_click pipeline to prefer the local Siamese path and fall back to the legacy ddddocr-based implementation when models or dependencies are unavailable or detection fails.
Add ONNX Runtime execution-provider selection utilities and background warmup logic to optimise model loading and inference performance.
Update CLI, architecture docs, and agent guidelines to describe the new solver architecture, serve mode, and configuration knobs.
Adjust packaging to force-include ONNX models and font files in wheels and sdists, ensuring out-of-the-box availability of local models.

Build:

Extend pyproject optional dependencies with a word-click extra and ensure ONNX model and font assets are included in build artifacts via hatch configuration.

Documentation:

Revise README and docs to document the YOLO+Siamese-based word_click implementation, HTTP serve mode, configuration environment variables, and updated dependency extras.

- Replaced the LLM vision path with a faster YOLOv8 detection and Siamese similarity matching for character recognition in word-click challenges. - Introduced a new HTTP server for handling solve requests, improving usability for integrations and concurrent solves. - Added ONNX models for YOLOv8 detection and Siamese matching, along with a bundled font for character rendering. - Implemented a fallback mechanism to legacy ddddocr detection when the primary path is unavailable. - Enhanced dependency management for ONNX Runtime and OpenCV, ensuring compatibility across Python versions.

sourcery-ai · 2026-04-24T07:06:27Z

Reviewer's Guide

将 word_click 的 LLM-vision 解算器替换为本地 YOLOv8 + Siamese ONNX 流水线，新增带 ONNX 预热的长驻 HTTP 服务器，通过新的 word-click extra 将所需模型/字体打包进 wheel，并更新文档/CLI/配置以反映新的架构与运行时行为。

更新后的 word_click YOLO+Siamese 求解流水线时序图

sequenceDiagram
    actor User
    participant CLI as cli_main
    participant Core as solve
    participant Pipeline as pipelines_word_click
    participant WordOCR as solvers_word_ocr
    participant Legacy as legacy_ddddocr

    User->>CLI: run crack-tcaptcha solve
    CLI->>CLI: start word_click_warmup thread
    CLI->>Core: solve(appid, max_retries, entry_url)
    Core->>Pipeline: solve_one_attempt(dyn_show_info)
    Pipeline->>WordOCR: locate_chars_by_siamese(bg_bytes, targets)
    activate WordOCR
    WordOCR->>WordOCR: _bytes_to_bgr(bg_bytes)
    WordOCR->>WordOCR: _get_yolo_session()
    WordOCR->>WordOCR: _yolo_detect(bg_bgr)
    alt yolo_bboxes_found
        WordOCR->>WordOCR: _render_char(target)
        WordOCR->>WordOCR: _siamese_score_batch(crops, ref_img)
        WordOCR-->>Pipeline: click_coords[(cx, cy)...]
        Pipeline->>Core: pow + trajectory + verify
        Core-->>CLI: SolveResult(ok, ...)
        CLI-->>User: print or JSON output
    else yolo_error_or_zero_bboxes
        WordOCR-->>Pipeline: raise SolveError
        Pipeline->>Legacy: _fallback_ddddocr(bg_bytes, targets)
        Legacy-->>Pipeline: click_coords[(cx, cy)...]
        Pipeline->>Core: pow + trajectory + verify
        Core-->>CLI: SolveResult(ok_or_false,...)
        CLI-->>User: print or JSON output
    end
    deactivate WordOCR

新服务器中 HTTP /solve 的时序图

sequenceDiagram
    actor Client
    participant SRV as server_HTTP
    participant H as HttpHandler
    participant EXEC as ThreadPoolExecutor
    participant Core as solve
    participant WC as pipelines_word_click
    participant WO as solvers_word_ocr

    Client->>SRV: POST /solve {appid, retries, entry_url}
    SRV->>H: dispatch request
    H->>H: _check_auth(X-SK)
    alt auth_ok
        H->>EXEC: submit(solve, appid, max_retries, entry_url)
        EXEC->>Core: solve(...)
        Core->>WC: solve_one_attempt(...)
        WC->>WO: locate_chars_by_siamese(bg_bytes, targets)
        WO-->>WC: click_coords
        WC-->>Core: SolveResult
        Core-->>EXEC: SolveResult
        EXEC-->>H: SolveResult
        H-->>Client: 200 JSON(SolveResult + _cost_s)
    else unauthorized
        H-->>Client: 401 {status:error}
    end

新求解器与服务器模块的类图

classDiagram
    class WordOcrSolver {
        <<module>>
        +locate_chars_by_siamese(bg_bytes: bytes, targets: list~str~) list~tuple~int,int~~
        +warmup() void
        -_bytes_to_bgr(byte_data: bytes) ndarray
        -_render_char(char: str) ndarray
        -_yolo_detect(bg_bgr: ndarray) list~tuple~int,int,int,int~~
        -_siamese_score_batch(crops: list~ndarray~, ref: ndarray) list~float~
    }

    class OrtProvider {
        <<module>>
        +resolve_providers() list~str~
        -_BACKEND_MAP dict
        -_AUTO_PRIORITY tuple
    }

    class WordClickPipeline {
        +solve_one_attempt(dyn_show_info: dict) SolveResult
        -_fallback_ddddocr(bg_bytes: bytes, targets: list~str~) list~tuple~int,int~~
    }

    class ServerState {
        <<internal>>
        +executor: ThreadPoolExecutor
        +sk: str
        +providers: list~str~
        +started_at: float
    }

    class HttpHandler {
        <<BaseHTTPRequestHandler>>
        +do_GET() void
        +do_POST() void
        -_send_json(code: int, payload: dict) void
        -_check_auth() bool
        -state: ServerState
    }

    class ServerModule {
        <<module>>
        +run(host: str, port: int, workers: int, sk: str) void
        +main(argv: list~str~) void
        -_warmup_all() list~str~
    }

    class CliModule {
        <<module>>
        +main(argv: list~str~) void
        -_warmup_word_click() void
    }

    WordClickPipeline --> WordOcrSolver : uses
    WordOcrSolver --> OrtProvider : uses
    ServerModule --> ServerState : creates
    ServerModule --> HttpHandler : configures
    ServerModule --> WordOcrSolver : calls warmup
    CliModule --> WordOcrSolver : calls warmup
    CliModule --> ServerModule : dispatch_serve_command
    CliModule --> WordClickPipeline : indirect_via_solve

File-Level Changes

Change	Details	Files
将 word_click 的 LLM-vision + ddddocr 流水线替换为本地 YOLOv8 检测和 Siamese 相似度计算，同时保留 ddddocr 作为回退路径。	重构 word_click 流水线，使其调用新的 locate_chars_by_siamese 辅助函数，并仅在失败或零检测结果时回退到旧版基于 ddddocr 的匹配逻辑。从流水线中移除按 bbox 调用 ddddocr 进行 OCR 匹配的逻辑，改为更简单的回退路径，完全委托给旧版的 match_words。调整日志和错误处理，使 YOLO/Siamese 的失败只发出 warning 并触发 ddddocr 回退路径，而不是抛出硬错误。	`src/crack_tcaptcha/pipelines/word_click.py`
引入新的 YOLOv8 + Siamese word-click 求解器，支持 ONNX Runtime provider 选择、模型预热以及优化的批处理。	实现基于 ONNX 的 YOLOv8 检测，包含 letterbox 预处理、NMS，以及将坐标映射回原始背景图像。实现 Siamese 相似度评分，支持批量或按对推理、共享 ONNX 会话，并通过打包的字体渲染字符。添加预热函数，用于预加载会话并运行一次虚拟推理，以隐藏 ORT 冷启动开销，并将 locate_chars_by_siamese 暴露为公共 API。添加 ORT provider 解析辅助工具，从 TCAPTCHA_ORT_BACKEND 和可用后端中选择执行 provider，并始终回退到 CPU。	`src/crack_tcaptcha/solvers/word_ocr.py` `src/crack_tcaptcha/solvers/ort_provider.py`
添加用于通过 HTTP 解决验证码的长驻 HTTP 服务器入口，支持模型预热和有界并发。	实现基于标准库的 HTTP 服务器，暴露 /solve 和 /health 接口，并使用线程池限制并发 solve 调用数。集成可选的共享密钥认证，通过 X-SK 请求头和环境变量进行配置。在 /health 响应中返回 ONNX provider 信息和服务器运行时间，并确保通过信号处理和 executor 关闭实现干净的关闭流程。	`src/crack_tcaptcha/server.py`
扩展 CLI 以支持 serve 子命令，并在一次性求解时进行后台模型预热。	新增求解时的后台预热线程，用于预加载 YOLO + Siamese 会话，除非通过标志明确禁用。引入 serve 子命令，委托给新的 server 模块，并将 host/port/worker 参数及 TCAPTCHA_SERVE_SK 传递给服务器的 run 函数。更新参数帮助文本和描述，以区分一次性 solve 模式和长驻 serve 模式。	`src/crack_tcaptcha/cli.py`
将 YOLO/Siamese ONNX 模型和字体打包到发行版中，并定义包含所需依赖的专用 word-click extra。	声明新的 word-click 可选依赖集，包括 onnxruntime、opencv-python-headless 和 ddddocr，并更新 all extra 以包含这些依赖。配置 hatch wheel/sdist 构建，强制包含 ONNX 模型和字体文件，以便运行时无需额外下载即可使用。在项目配置中记录构建时的约束条件，以避免模型/文件重命名导致代码与打包信息不一致。	`pyproject.toml` `src/crack_tcaptcha/solvers/models/word_click_detector.onnx` `src/crack_tcaptcha/solvers/models/word_click_matcher.onnx` `src/crack_tcaptcha/solvers/models/font.ttf`
更新 README、文档以及 agent/开发者指南，描述新的 word_click 架构、依赖、环境变量以及 serve 模式。	在高级文档和按流水线划分的文档中，将 word_click 中对 LLM vision 的描述替换为本地 YOLO + Siamese 流水线及 ddddocr 回退的说明。记录新的 word-click extra、打包的模型、ONNX provider 选择逻辑，以及包含 macOS CoreML 行为在内的性能注意事项。解释新的 serve 模式用法、认证方式、环境变量，以及其在整体架构图和 agent 指南中的定位。更新 extras 表格和安装说明，以反映 word_click 不再依赖 LLM，并说明只有 image_select 仍使用 LLM vision。	`README.md` `docs/word-click.md` `docs/architecture.md` `docs/index.md` `AGENTS.md` `CLAUDE.md`

Tips and commands

Interacting with Sourcery

触发新评审： 在 pull request 中评论 @sourcery-ai review。
继续讨论： 直接回复 Sourcery 的评审评论。
从评审评论生成 GitHub issue： 通过回复评审评论，让 Sourcery 从该评论创建一个 issue。你也可以回复评审评论 @sourcery-ai issue 来从中创建 issue。
生成 pull request 标题： 在 pull request 标题的任意位置写上 @sourcery-ai 以随时生成标题。你也可以在 pull request 中评论 @sourcery-ai title 来（重新）生成标题。
生成 pull request 摘要： 在 pull request 正文任意位置写上 @sourcery-ai summary，即可在该位置生成 PR 摘要。你也可以评论 @sourcery-ai summary 来在任意时间（重新）生成摘要。
生成 reviewer's guide： 在 pull request 中评论 @sourcery-ai guide，即可在任意时间（重新）生成 reviewer’s guide。
一次性解决所有 Sourcery 评论： 在 pull request 中评论 @sourcery-ai resolve 来将所有 Sourcery 评论标记为已解决。如果你已经处理了所有评论且不希望再看到它们，这非常有用。
忽略所有 Sourcery 评审： 在 pull request 中评论 @sourcery-ai dismiss 来忽略所有现有的 Sourcery 评审。如果你想以一次全新的评审开始，这尤其有用——别忘了再评论 @sourcery-ai review 来触发新评审！

Customizing Your Experience

访问你的 dashboard 以：

启用或禁用评审特性，例如 Sourcery 自动生成的 pull request 摘要、reviewer’s guide 等。
更改评审语言。
添加、移除或编辑自定义评审指令。
调整其他评审设置。

Getting Help

如有问题或反馈，请联系支持团队。
访问我们的文档以获取详细指南和信息。
通过关注我们的 X/Twitter、LinkedIn 或 GitHub 与 Sourcery 团队保持联系。

Original review guide in English

Reviewer's Guide

Replaces the word_click LLM-vision solver with a local YOLOv8 + Siamese ONNX pipeline, adds a long-running HTTP server with ONNX warmup, bundles required models/fonts into the wheel via a new word-click extra, and updates docs/CLI/config to reflect the new architecture and runtime behavior.

Sequence diagram for updated word_click YOLO+Siamese solver pipeline

sequenceDiagram
    actor User
    participant CLI as cli_main
    participant Core as solve
    participant Pipeline as pipelines_word_click
    participant WordOCR as solvers_word_ocr
    participant Legacy as legacy_ddddocr

    User->>CLI: run crack-tcaptcha solve
    CLI->>CLI: start word_click_warmup thread
    CLI->>Core: solve(appid, max_retries, entry_url)
    Core->>Pipeline: solve_one_attempt(dyn_show_info)
    Pipeline->>WordOCR: locate_chars_by_siamese(bg_bytes, targets)
    activate WordOCR
    WordOCR->>WordOCR: _bytes_to_bgr(bg_bytes)
    WordOCR->>WordOCR: _get_yolo_session()
    WordOCR->>WordOCR: _yolo_detect(bg_bgr)
    alt yolo_bboxes_found
        WordOCR->>WordOCR: _render_char(target)
        WordOCR->>WordOCR: _siamese_score_batch(crops, ref_img)
        WordOCR-->>Pipeline: click_coords[(cx, cy)...]
        Pipeline->>Core: pow + trajectory + verify
        Core-->>CLI: SolveResult(ok, ...)
        CLI-->>User: print or JSON output
    else yolo_error_or_zero_bboxes
        WordOCR-->>Pipeline: raise SolveError
        Pipeline->>Legacy: _fallback_ddddocr(bg_bytes, targets)
        Legacy-->>Pipeline: click_coords[(cx, cy)...]
        Pipeline->>Core: pow + trajectory + verify
        Core-->>CLI: SolveResult(ok_or_false,...)
        CLI-->>User: print or JSON output
    end
    deactivate WordOCR

Sequence diagram for HTTP /solve in the new server

sequenceDiagram
    actor Client
    participant SRV as server_HTTP
    participant H as HttpHandler
    participant EXEC as ThreadPoolExecutor
    participant Core as solve
    participant WC as pipelines_word_click
    participant WO as solvers_word_ocr

    Client->>SRV: POST /solve {appid, retries, entry_url}
    SRV->>H: dispatch request
    H->>H: _check_auth(X-SK)
    alt auth_ok
        H->>EXEC: submit(solve, appid, max_retries, entry_url)
        EXEC->>Core: solve(...)
        Core->>WC: solve_one_attempt(...)
        WC->>WO: locate_chars_by_siamese(bg_bytes, targets)
        WO-->>WC: click_coords
        WC-->>Core: SolveResult
        Core-->>EXEC: SolveResult
        EXEC-->>H: SolveResult
        H-->>Client: 200 JSON(SolveResult + _cost_s)
    else unauthorized
        H-->>Client: 401 {status:error}
    end

Class diagram for new solver and server modules

classDiagram
    class WordOcrSolver {
        <<module>>
        +locate_chars_by_siamese(bg_bytes: bytes, targets: list~str~) list~tuple~int,int~~
        +warmup() void
        -_bytes_to_bgr(byte_data: bytes) ndarray
        -_render_char(char: str) ndarray
        -_yolo_detect(bg_bgr: ndarray) list~tuple~int,int,int,int~~
        -_siamese_score_batch(crops: list~ndarray~, ref: ndarray) list~float~
    }

    class OrtProvider {
        <<module>>
        +resolve_providers() list~str~
        -_BACKEND_MAP dict
        -_AUTO_PRIORITY tuple
    }

    class WordClickPipeline {
        +solve_one_attempt(dyn_show_info: dict) SolveResult
        -_fallback_ddddocr(bg_bytes: bytes, targets: list~str~) list~tuple~int,int~~
    }

    class ServerState {
        <<internal>>
        +executor: ThreadPoolExecutor
        +sk: str
        +providers: list~str~
        +started_at: float
    }

    class HttpHandler {
        <<BaseHTTPRequestHandler>>
        +do_GET() void
        +do_POST() void
        -_send_json(code: int, payload: dict) void
        -_check_auth() bool
        -state: ServerState
    }

    class ServerModule {
        <<module>>
        +run(host: str, port: int, workers: int, sk: str) void
        +main(argv: list~str~) void
        -_warmup_all() list~str~
    }

    class CliModule {
        <<module>>
        +main(argv: list~str~) void
        -_warmup_word_click() void
    }

    WordClickPipeline --> WordOcrSolver : uses
    WordOcrSolver --> OrtProvider : uses
    ServerModule --> ServerState : creates
    ServerModule --> HttpHandler : configures
    ServerModule --> WordOcrSolver : calls warmup
    CliModule --> WordOcrSolver : calls warmup
    CliModule --> ServerModule : dispatch_serve_command
    CliModule --> WordClickPipeline : indirect_via_solve

File-Level Changes

Change	Details	Files
Replace word_click LLM-vision + ddddocr pipeline with local YOLOv8 detection and Siamese similarity, keeping ddddocr as a fallback path.	Refactored word_click pipeline to call a new locate_chars_by_siamese helper and only fall back to legacy ddddocr-based matching on failures or zero detections. Removed per-bbox ddddocr OCR matching logic from the pipeline and replaced it with a simpler fallback that delegates fully to legacy match_words. Adjusted logging and error handling so YOLO/Siamese failures emit warnings and trigger the ddddocr path rather than hard errors.	`src/crack_tcaptcha/pipelines/word_click.py`
Introduce a new YOLOv8 + Siamese word-click solver with ONNX Runtime provider selection, model warmup, and optimized batching.	Implemented ONNX-based YOLOv8 detection with letterbox preprocessing, NMS, and coordinate unmapping back to the original background image. Implemented Siamese similarity scoring with batched or per-pair inference, shared ONNX sessions, and character rendering via a bundled font. Added a warmup function that preloads sessions and runs dummy inference to hide ORT cold-start, and exposed locate_chars_by_siamese as the public API. Added an ORT provider resolution helper that chooses execution providers from TCAPTCHA_ORT_BACKEND and available backends, always falling back to CPU.	`src/crack_tcaptcha/solvers/word_ocr.py` `src/crack_tcaptcha/solvers/ort_provider.py`
Add a long-running HTTP server entrypoint for solving captchas over HTTP with model warmup and bounded concurrency.	Implemented a stdlib-based HTTP server exposing /solve and /health endpoints, using a thread pool to bound concurrent solve calls. Integrated optional shared-secret authentication via the X-SK header with configuration via environment variables. Captured ONNX provider information and uptime in the /health response and ensured clean shutdown with signal handling and executor teardown.	`src/crack_tcaptcha/server.py`
Extend the CLI to support a serve subcommand and background model warmup for one-shot solves.	Added a solve-time background warmup thread that preloads YOLO + Siamese sessions unless explicitly disabled with a flag. Introduced a serve subcommand that delegates to the new server module, wiring host/port/worker arguments and TCAPTCHA_SERVE_SK to the server run function. Updated argument help text and descriptions to distinguish one-shot solve from long-running serve mode.	`src/crack_tcaptcha/cli.py`
Bundle YOLO/Siamese ONNX models and font into the distribution and define a dedicated word-click extra with required dependencies.	Declared a new word-click optional dependency set including onnxruntime, opencv-python-headless, and ddddocr, and updated the all extra to include these. Configured hatch wheel/sdist builds to force-include the ONNX models and font files so they are available at runtime without extra downloads. Documented build-time constraints in project configuration to avoid model/file renames that would desynchronize code and packaging.	`pyproject.toml` `src/crack_tcaptcha/solvers/models/word_click_detector.onnx` `src/crack_tcaptcha/solvers/models/word_click_matcher.onnx` `src/crack_tcaptcha/solvers/models/font.ttf`
Update README, docs, and agent/developer guides to describe the new word_click architecture, dependencies, environment variables, and serve mode.	Replaced references to LLM vision in word_click with descriptions of the local YOLO + Siamese pipeline and ddddocr fallback in high-level docs and per-pipeline docs. Documented the new word-click extra, bundled models, ONNX provider selection, and performance considerations including macOS CoreML behavior. Explained the new serve mode usage, authentication, environment variables, and how it fits into the overall architecture diagrams and agent guidelines. Updated extras tables and installation instructions to reflect that word_click no longer depends on LLM and that only image_select uses LLM vision.	`README.md` `docs/word-click.md` `docs/architecture.md` `docs/index.md` `AGENTS.md` `CLAUDE.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

lifefloating · 2026-04-24T07:07:12Z

测试正常，只是数据实际上还是没有很快，需要考虑这个issue #2

sourcery-ai

Hey - 我发现了 4 个问题，并给出了一些整体性反馈：

word_ocr.py 模块的 docstring 里仍然提到 yolo_word.onnx / siamese_word.onnx，但实际打包的文件是 word_click_detector.onnx / word_click_matcher.onnx；对齐这些名称可以避免在调试模型问题时产生混淆。
_render_char 每次调用都会重新创建 ImageFont.truetype 字体对象，当目标较多时这会带来明显的开销；建议在模块级缓存已加载的字体并复用它。
在 server._warmup_all 中你导入并调用了 _get_yolo_session / _get_siamese_session 这些私有辅助函数；更稳健的做法是要么暴露用于获取 provider 信息的公共访问器，要么只依赖公共的 warmup() API，这样内部重构时就不会轻易破坏 server。

给 AI Agents 的提示

请处理本次代码评审中的所有评论：

## 总体评论
- `word_ocr.py` 模块的 docstring 里仍然提到 `yolo_word.onnx` / `siamese_word.onnx`，但实际打包的文件是 `word_click_detector.onnx` / `word_click_matcher.onnx`；对齐这些名称可以避免在调试模型问题时产生混淆。
- `_render_char` 每次调用都会重新创建 `ImageFont.truetype` 字体对象，当 `locate_chars_by_siamese` 频繁调用或目标较多时，这会带来明显的开销；建议在模块级缓存已加载的字体并复用它。你也可以按字符缓存渲染后的字形，避免在多次调用间重复渲染。
- 在 `server._warmup_all` 中你导入并调用了 `_get_yolo_session` / `_get_siamese_session` 这些私有辅助函数；更稳健的做法是要么暴露用于获取 provider 信息的公共访问器，要么只依赖公共的 `warmup()` API，这样内部重构时就不会轻易破坏 server。

## 单独评论

### Comment 1
<location path="src/crack_tcaptcha/solvers/word_ocr.py" line_range="171-179" />
<code_context>
+    return img
+
+
+def _render_char(char: str) -> np.ndarray:
+    """Render one CJK char to a 52×52 BGR image using the bundled font."""
+    from PIL import Image, ImageDraw, ImageFont
+
+    if not _FONT_PATH.is_file():
+        raise SolveError(f"word_click: missing font at {_FONT_PATH}")
+    img = Image.new("RGB", (_CHAR_RENDER_SIZE, _CHAR_RENDER_SIZE), color="white")
+    draw = ImageDraw.Draw(img)
+    font = ImageFont.truetype(str(_FONT_PATH), _CHAR_RENDER_FONT_SIZE)
+    bbox = font.getbbox(char)
+    text_w = bbox[2] - bbox[0]
</code_context>
<issue_to_address>
**suggestion (performance):** 避免在每次字符渲染时重新加载 TTF 字体，以减少每个请求的延迟。

`_render_char` 每次调用都会创建新的 `ImageFont.truetype`，当 `locate_chars_by_siamese` 频繁执行或目标很多时，这个开销会比较大。可以只加载一次字体进行缓存（例如使用模块级 `_FONT` 并在需要时加锁），在后续调用中复用。你也可以按字符缓存渲染结果，从而在多次调用之间避免重复渲染同一字符。

建议实现如下：

```python
# Cache the TrueType font and rendered glyphs to avoid per-call overhead.
_CHAR_FONT = None
_CHAR_FONT_LOCK = threading.Lock()
_CHAR_GLYPH_CACHE: dict[str, np.ndarray] = {}
_CHAR_GLYPH_CACHE_LOCK = threading.Lock()


def _render_char(char: str) -> np.ndarray:
    """Render one CJK char to a 52×52 BGR image using the bundled font."""
    from PIL import Image, ImageDraw, ImageFont

    global _CHAR_FONT

    if not _FONT_PATH.is_file():
        raise SolveError(f"word_click: missing font at {_FONT_PATH}")

    # Fast path: return cached glyph if available.
    with _CHAR_GLYPH_CACHE_LOCK:
        cached = _CHAR_GLYPH_CACHE.get(char)
    if cached is not None:
        # Return a copy so callers can't mutate the cached image.
        return cached.copy()

    # Lazily initialize and cache the font once.
    with _CHAR_FONT_LOCK:
        if _CHAR_FONT is None:
            _CHAR_FONT = ImageFont.truetype(str(_FONT_PATH), _CHAR_RENDER_FONT_SIZE)
        font = _CHAR_FONT

    img = Image.new("RGB", (_CHAR_RENDER_SIZE, _CHAR_RENDER_SIZE), color="white")
    draw = ImageDraw.Draw(img)
    bbox = font.getbbox(char)
    text_w = bbox[2] - bbox[0]
    x = (_CHAR_RENDER_SIZE - text_w) // 2
    y = -3
    draw.text((x, y), char, fill=_CHAR_RENDER_COLOR, font=font)
    cv2 = _import_cv2()
    bgr_img = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)

    # Store in cache for subsequent calls.
    with _CHAR_GLYPH_CACHE_LOCK:
        _CHAR_GLYPH_CACHE[char] = bgr_img

    return bgr_img.copy()

```

1. 请确保在 `src/crack_tcaptcha/solvers/word_ocr.py` 顶部导入了 `threading`：
   - 如果尚未导入，请添加 `import threading`。
2. 如果项目需要兼容 3.9 之前的 Python 版本，且不支持 `dict[str, np.ndarray]` 这种类型注解，
   请将 `_CHAR_GLYPH_CACHE: dict[str, np.ndarray] = {}` 改为兼容写法，例如 `from typing import Dict` 并使用 `Dict[str, np.ndarray]`。
</issue_to_address>

### Comment 2
<location path="src/crack_tcaptcha/server.py" line_range="136" />
<code_context>
+        if not appid:
+            self._send_json(400, {"status": "error", "msg": "missing appid"})
+            return
+        retries = int(body.get("retries", body.get("max_retries", 3)))
+        entry_url = body.get("entry_url", "")
+
</code_context>
<issue_to_address>
**issue (bug_risk):** 在解析请求体中的 `retries` 之前进行校验，避免因为错误输入导致 500。

直接用 `int(...)` 解析 `retries`，在输入为非整数（比如 "many" 或浮点数）时会抛出 `ValueError`，从而把本应是客户端错误变成 500。请先对 `retries` 做验证，当它不是正整数或超出允许范围时，返回 400 并附上清晰的错误信息。
</issue_to_address>

### Comment 3
<location path="AGENTS.md" line_range="28" />
<code_context>
 # Install with optional extras
-uv sync --extra icon-click   # adds ddddocr + onnxruntime (needed for icon_click and word_click)
+uv sync --extra icon-click   # ddddocr + onnxruntime (icon_click pipeline)
+uv sync --extra word-click   # onnxruntime + opencv-headless + ddddocr (word_click pipeline, local YOLO+Siamese)
 uv sync --extra dev          # pytest, respx, ruff, hypothesis
 uv sync --extra docs         # mkdocs-material
</code_context>
<issue_to_address>
**issue (typo):** 这里的包名很可能应该是 `opencv-python-headless`，另外 `YOLO + Siamese` 两侧可以加空格。

在其它文档（比如 README 的 extras 表格和 `docs/word-click.md`）中，这个 extra 被写成 `opencv-python-headless`，此处使用 `opencv-headless` 会造成不一致，可能会误导用户在安装 extra 时使用错误的名字。另外，为了与其他地方保持一致，可以把 `YOLO+Siamese` 改为 `local YOLO + Siamese`。

```suggestion
uv sync --extra word-click   # onnxruntime + opencv-python-headless + ddddocr (word_click pipeline, local YOLO + Siamese)
```
</issue_to_address>

### Comment 4
<location path="src/crack_tcaptcha/solvers/word_ocr.py" line_range="268" />
<code_context>
+    return arr[None, ...]
+
+
+def _siamese_score_batch(crops: list[np.ndarray], ref: np.ndarray) -> list[float]:
+    """Score every crop against the ref in one (or as few as possible) ORT calls.
+
</code_context>
<issue_to_address>
**issue (complexity):** 建议重构 `_siamese_score_batch` 和贪心分配循环，将预处理集中化、将 batch 支持检测逻辑单独抽离出来，并使用更具声明性的索引选择方式，以获得更清晰、且更不易出错的控制流。

在保持当前行为不变的前提下，你可以简化两个相对复杂的部分：`_siamese_score_batch` 和贪心分配逻辑。

---

### 1) 简化 `_siamese_score_batch` 结构

当前这个函数同时在做：

* 检测是否支持 batch。
* 对 crops 进行两次预处理（一次在 batch 的 `try` 分支里，一次在逐对处理路径里）。

你可以通过以下方式减少分支和重复工作：

* 先对所有 crops 做一次统一预处理。
* 把“尝试一次 batch，然后缓存结果”的逻辑抽成一个小的内部块。
* 保持 `_siamese_batch_supported` 的行为完全不变。

这可以保留动态检测 + 回退的特性，同时让主流程更易读，开销也更小。

```python
def _siamese_score_batch(crops: list[np.ndarray], ref: np.ndarray) -> list[float]:
    global _siamese_batch_supported

    if not crops:
        return []

    sess = _get_siamese_session()
    assert _siamese_input_names is not None
    n0, n1 = _siamese_input_names

    ref_prepped = _prep_siamese(ref)           # (1, 3, 52, 52)
    prepped = [_prep_siamese(c) for c in crops]  # list of (1, 3, 52, 52)

    # try batched once; cache decision
    if _siamese_batch_supported is not False:
        try:
            batch = np.concatenate(prepped, axis=0)          # (N, 3, 52, 52)
            refs = np.repeat(ref_prepped, len(prepped), 0)   # (N, 3, 52, 52)
            pred = sess.run(None, {n0: batch, n1: refs})[0]
            arr = np.asarray(pred).reshape(-1)
            if arr.size == len(prepped):
                _siamese_batch_supported = True
                return [float(v) for v in arr]
        except Exception as e:
            log.info("word_click siamese batch not supported, using per-pair: %s", e)
            _siamese_batch_supported = False

    # per-pair fallback (same semantics as current code)
    out: list[float] = []
    for p in prepped:
        pred = sess.run(None, {n0: p, n1: ref_prepped})[0]
        out.append(float(np.asarray(pred).reshape(-1)[0]))
    return out
```

这样有几个好处：

* 只需要一次 `_prep_siamese` 循环。
* 控制流更加线性，并将“batch 探测”逻辑清晰地隔离出来。
* 维持相同的日志和 `_siamese_batch_supported` 行为。

---

### 2) 让贪心分配逻辑更具声明性

目前的贪心分配逻辑是手动维护 `best_idx` / `best_score`，并在所有候选都用完时重新扫描。可以通过以下方式增强可读性：

* 维护一个“未使用索引”的集合。
* 预先为每个 target 计算“全局最佳”索引，用于需要重复使用时。
* 使用 `max(..., key=...)` 而不是手写循环。

这样可以在保持原有行为（对每个 target 选择最佳未使用的候选，否则复用该 target 的全局最佳）的同时，缩短命令式逻辑。

```python
    # Full score matrix: rows = targets, cols = crop indices.
    score_matrix: list[list[float]] = []
    for ch in targets:
        ref = _render_char(ch)
        score_matrix.append(_siamese_score_batch(crops, ref))

    # Precompute global best index per target for the "reuse best overall" case.
    global_best_idx: list[int] = []
    for scores in score_matrix:
        if not scores:
            global_best_idx.append(-1)
            continue
        global_best_idx.append(max(range(len(scores)), key=scores.__getitem__))

    result: list[tuple[int, int]] = []
    used: set[int] = set(range(len(crops)))  # start with all, then flip logic?
```

使用未用索引集合后会更好：

```python
    result: list[tuple[int, int]] = []
    unused: set[int] = set(range(len(crops)))

    for ti, ch in enumerate(targets):
        scores = score_matrix[ti]

        # best among unused, if any
        if unused:
            best_idx = max(unused, key=scores.__getitem__)
            best_score = scores[best_idx]
        else:
            best_idx = global_best_idx[ti]
            if best_idx < 0:
                raise SolveError(f"word_click: no candidate for target {ch!r}")
            best_score = scores[best_idx]

        if best_idx in unused:
            unused.remove(best_idx)

        result.append(centers[best_idx])
        log.info("word_click: %r → %s (score=%.3f)", ch, centers[best_idx], best_score)
```

这在保持相同贪心策略和重用语义的情况下，使代码更易于理解和修改。

---

如果你认为当前行为已经满足需求，上述这两处聚焦的重构可以在不牺牲鲁棒性和性能调优的前提下，消除一些“手工”状态管理和分支。
</issue_to_address>

Sourcery 对开源项目是免费的——如果你觉得我们的评审有帮助，欢迎分享给更多人 ✨

_{帮我变得更有用！请对每条评论点 👍 或 👎，我会根据这些反馈改进后续评审。}

Original comment in English

Hey - I've found 4 issues, and left some high level feedback:

The word_ocr.py module docstring still refers to yolo_word.onnx / siamese_word.onnx, but the actual bundled files are word_click_detector.onnx / word_click_matcher.onnx; aligning these names will avoid confusion when debugging model issues.
_render_char recreates the ImageFont.truetype font object on every call, which can be a noticeable overhead when there are many targets; consider caching the loaded font at module scope and reusing it.
In server._warmup_all you import and call _get_yolo_session / _get_siamese_session, which are private helpers; it would be more robust to either expose public accessors for provider info or rely solely on the public warmup() API so refactors of internals don’t break the server.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The `word_ocr.py` module docstring still refers to `yolo_word.onnx` / `siamese_word.onnx`, but the actual bundled files are `word_click_detector.onnx` / `word_click_matcher.onnx`; aligning these names will avoid confusion when debugging model issues.
- `_render_char` recreates the `ImageFont.truetype` font object on every call, which can be a noticeable overhead when there are many targets; consider caching the loaded font at module scope and reusing it.
- In `server._warmup_all` you import and call `_get_yolo_session` / `_get_siamese_session`, which are private helpers; it would be more robust to either expose public accessors for provider info or rely solely on the public `warmup()` API so refactors of internals don’t break the server.

## Individual Comments

### Comment 1
<location path="src/crack_tcaptcha/solvers/word_ocr.py" line_range="171-179" />
<code_context>
+    return img
+
+
+def _render_char(char: str) -> np.ndarray:
+    """Render one CJK char to a 52×52 BGR image using the bundled font."""
+    from PIL import Image, ImageDraw, ImageFont
+
+    if not _FONT_PATH.is_file():
+        raise SolveError(f"word_click: missing font at {_FONT_PATH}")
+    img = Image.new("RGB", (_CHAR_RENDER_SIZE, _CHAR_RENDER_SIZE), color="white")
+    draw = ImageDraw.Draw(img)
+    font = ImageFont.truetype(str(_FONT_PATH), _CHAR_RENDER_FONT_SIZE)
+    bbox = font.getbbox(char)
+    text_w = bbox[2] - bbox[0]
</code_context>
<issue_to_address>
**suggestion (performance):** Avoid reloading the TTF font on every character render to reduce per-request latency.

`_render_char` creates a new `ImageFont.truetype` on every call, which is costly when `locate_chars_by_siamese` runs often or over many targets. Cache the font once (e.g., a module-level `_FONT` with locking if needed) and reuse it. You might also cache rendered glyphs per character to avoid repeated rendering across calls.

Suggested implementation:

```python
# Cache the TrueType font and rendered glyphs to avoid per-call overhead.
_CHAR_FONT = None
_CHAR_FONT_LOCK = threading.Lock()
_CHAR_GLYPH_CACHE: dict[str, np.ndarray] = {}
_CHAR_GLYPH_CACHE_LOCK = threading.Lock()


def _render_char(char: str) -> np.ndarray:
    """Render one CJK char to a 52×52 BGR image using the bundled font."""
    from PIL import Image, ImageDraw, ImageFont

    global _CHAR_FONT

    if not _FONT_PATH.is_file():
        raise SolveError(f"word_click: missing font at {_FONT_PATH}")

    # Fast path: return cached glyph if available.
    with _CHAR_GLYPH_CACHE_LOCK:
        cached = _CHAR_GLYPH_CACHE.get(char)
    if cached is not None:
        # Return a copy so callers can't mutate the cached image.
        return cached.copy()

    # Lazily initialize and cache the font once.
    with _CHAR_FONT_LOCK:
        if _CHAR_FONT is None:
            _CHAR_FONT = ImageFont.truetype(str(_FONT_PATH), _CHAR_RENDER_FONT_SIZE)
        font = _CHAR_FONT

    img = Image.new("RGB", (_CHAR_RENDER_SIZE, _CHAR_RENDER_SIZE), color="white")
    draw = ImageDraw.Draw(img)
    bbox = font.getbbox(char)
    text_w = bbox[2] - bbox[0]
    x = (_CHAR_RENDER_SIZE - text_w) // 2
    y = -3
    draw.text((x, y), char, fill=_CHAR_RENDER_COLOR, font=font)
    cv2 = _import_cv2()
    bgr_img = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)

    # Store in cache for subsequent calls.
    with _CHAR_GLYPH_CACHE_LOCK:
        _CHAR_GLYPH_CACHE[char] = bgr_img

    return bgr_img.copy()

```

1. At the top of `src/crack_tcaptcha/solvers/word_ocr.py`, ensure `threading` is imported:
   - Add `import threading` if it is not already present.
2. If the project targets Python versions earlier than 3.9 and does not support `dict[str, np.ndarray]` type hints, change `_CHAR_GLYPH_CACHE: dict[str, np.ndarray] = {}` to a compatible annotation such as `from typing import Dict` and `Dict[str, np.ndarray]`.
</issue_to_address>

### Comment 2
<location path="src/crack_tcaptcha/server.py" line_range="136" />
<code_context>
+        if not appid:
+            self._send_json(400, {"status": "error", "msg": "missing appid"})
+            return
+        retries = int(body.get("retries", body.get("max_retries", 3)))
+        entry_url = body.get("entry_url", "")
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Validate `retries` from the request body to avoid 500s on bad input.

Parsing `retries` with `int(...)` will raise `ValueError` for non-integer input (e.g. "many", a float), causing a 500 for a client error. Please validate `retries` first and return a 400 with a clear message when it’s not a positive integer or is out of the allowed range.
</issue_to_address>

### Comment 3
<location path="AGENTS.md" line_range="28" />
<code_context>
 # Install with optional extras
-uv sync --extra icon-click   # adds ddddocr + onnxruntime (needed for icon_click and word_click)
+uv sync --extra icon-click   # ddddocr + onnxruntime (icon_click pipeline)
+uv sync --extra word-click   # onnxruntime + opencv-headless + ddddocr (word_click pipeline, local YOLO+Siamese)
 uv sync --extra dev          # pytest, respx, ruff, hypothesis
 uv sync --extra docs         # mkdocs-material
</code_context>
<issue_to_address>
**issue (typo):** The package name here likely should be `opencv-python-headless`, and you may want spacing around `YOLO + Siamese`.

In other docs (README extras table and `docs/word-click.md`), this extra is documented as `opencv-python-headless`, so using `opencv-headless` here is inconsistent and could mislead users installing the extra. Also, for consistency with other references, consider `local YOLO + Siamese` instead of `YOLO+Siamese`.

```suggestion
uv sync --extra word-click   # onnxruntime + opencv-python-headless + ddddocr (word_click pipeline, local YOLO + Siamese)
```
</issue_to_address>

### Comment 4
<location path="src/crack_tcaptcha/solvers/word_ocr.py" line_range="268" />
<code_context>
+    return arr[None, ...]
+
+
+def _siamese_score_batch(crops: list[np.ndarray], ref: np.ndarray) -> list[float]:
+    """Score every crop against the ref in one (or as few as possible) ORT calls.
+
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring `_siamese_score_batch` and the greedy assignment loop to centralize preprocessing, isolate the batch-detection logic, and use more declarative index selection for clearer, less error-prone control flow.

You can keep all current behavior but simplify two of the more complex areas: `_siamese_score_batch` and the greedy assignment.

---

### 1) Simplify `_siamese_score_batch` structure

Right now the function both:

* Detects whether batching is supported.
* Prepares crops twice (once in the batch `try`, once in the per‑pair path).

You can reduce branching and duplicate work by:

* Preprocessing all crops once up front.
* Moving the “try batch once, then cache” logic into a small helper.
* Keeping `_siamese_batch_supported` behavior exactly the same.

This keeps the feature (dynamic detection + fallback) but makes the main flow easier to read and cheaper.

```python
def _siamese_score_batch(crops: list[np.ndarray], ref: np.ndarray) -> list[float]:
    global _siamese_batch_supported

    if not crops:
        return []

    sess = _get_siamese_session()
    assert _siamese_input_names is not None
    n0, n1 = _siamese_input_names

    ref_prepped = _prep_siamese(ref)           # (1, 3, 52, 52)
    prepped = [_prep_siamese(c) for c in crops]  # list of (1, 3, 52, 52)

    # try batched once; cache decision
    if _siamese_batch_supported is not False:
        try:
            batch = np.concatenate(prepped, axis=0)          # (N, 3, 52, 52)
            refs = np.repeat(ref_prepped, len(prepped), 0)   # (N, 3, 52, 52)
            pred = sess.run(None, {n0: batch, n1: refs})[0]
            arr = np.asarray(pred).reshape(-1)
            if arr.size == len(prepped):
                _siamese_batch_supported = True
                return [float(v) for v in arr]
        except Exception as e:
            log.info("word_click siamese batch not supported, using per-pair: %s", e)
            _siamese_batch_supported = False

    # per-pair fallback (same semantics as current code)
    out: list[float] = []
    for p in prepped:
        pred = sess.run(None, {n0: p, n1: ref_prepped})[0]
        out.append(float(np.asarray(pred).reshape(-1)[0]))
    return out
```

Benefits:

* Only one `_prep_siamese` loop.
* The control flow is linear with a clearly isolated “batch probe” block.
* Maintains the same logging and `_siamese_batch_supported` behavior.

---

### 2) Make greedy assignment more declarative

The greedy assignment currently manually tracks `best_idx` / `best_score` and then re‑scans when everything is used. You can make this more readable by:

* Maintaining a set of unused indices.
* Precomputing the “global best” index per target for the reuse case.
* Using `max(..., key=...)` instead of hand‑rolled loops.

This preserves exactly the same behavior (best unused per target, otherwise reuse best overall), but shrinks the imperative logic.

```python
    # Full score matrix: rows = targets, cols = crop indices.
    score_matrix: list[list[float]] = []
    for ch in targets:
        ref = _render_char(ch)
        score_matrix.append(_siamese_score_batch(crops, ref))

    # Precompute global best index per target for the "reuse best overall" case.
    global_best_idx: list[int] = []
    for scores in score_matrix:
        if not scores:
            global_best_idx.append(-1)
            continue
        global_best_idx.append(max(range(len(scores)), key=scores.__getitem__))

    result: list[tuple[int, int]] = []
    used: set[int] = set(range(len(crops)))  # start with all, then flip logic?
```

Better with unused set:

```python
    result: list[tuple[int, int]] = []
    unused: set[int] = set(range(len(crops)))

    for ti, ch in enumerate(targets):
        scores = score_matrix[ti]

        # best among unused, if any
        if unused:
            best_idx = max(unused, key=scores.__getitem__)
            best_score = scores[best_idx]
        else:
            best_idx = global_best_idx[ti]
            if best_idx < 0:
                raise SolveError(f"word_click: no candidate for target {ch!r}")
            best_score = scores[best_idx]

        if best_idx in unused:
            unused.remove(best_idx)

        result.append(centers[best_idx])
        log.info("word_click: %r → %s (score=%.3f)", ch, centers[best_idx], best_score)
```

This keeps the same greedy strategy and reuse semantics, but is easier to follow and modify.

---

If you’re happy with the current behavior, these two focused refactors remove some of the “manual” bookkeeping and branching without dropping any of the robustness or performance tuning you’ve added.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

lifefloating

考虑tdc转rust，tdc过程不够快

lifefloating self-assigned this Apr 24, 2026

sourcery-ai Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread src/crack_tcaptcha/solvers/word_ocr.py

Comment thread src/crack_tcaptcha/server.py

Comment thread AGENTS.md

Comment thread src/crack_tcaptcha/solvers/word_ocr.py

lifefloating added 3 commits April 24, 2026 15:11

docs: update README

b9bcf4e

feat: add dependency groups

16e0cef

refactor: update test cases for word_click pipeline

a62268b

lifefloating commented Apr 24, 2026

View reviewed changes

lifefloating merged commit 36e5e39 into main Apr 24, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement word-click solver using YOLOv8 and Siamese similarity#3

feat: implement word-click solver using YOLOv8 and Siamese similarity#3
lifefloating merged 4 commits into
mainfrom
dev-yolo8-siamese

lifefloating commented Apr 24, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot commented Apr 24, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Reviewer's Guide

Sequence diagram for updated word_click YOLO+Siamese solver pipeline

Sequence diagram for HTTP /solve in the new server

Class diagram for new solver and server modules

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

lifefloating commented Apr 24, 2026

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lifefloating left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lifefloating commented Apr 24, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

更新后的 word_click YOLO+Siamese 求解流水线时序图

新服务器中 HTTP /solve 的时序图

新求解器与服务器模块的类图

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Reviewer's Guide

Sequence diagram for updated word_click YOLO+Siamese solver pipeline

Sequence diagram for HTTP /solve in the new server

Class diagram for new solver and server modules

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

lifefloating commented Apr 24, 2026

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lifefloating left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lifefloating commented Apr 24, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented Apr 24, 2026 •

edited

Loading