Skip to content

feat(torch): generate PyTorch backend from local ATen schemas#595

Open
voltjia wants to merge 4 commits into
masterfrom
feat/torch-codegen
Open

feat(torch): generate PyTorch backend from local ATen schemas#595
voltjia wants to merge 4 commits into
masterfrom
feat/torch-codegen

Conversation

@voltjia
Copy link
Copy Markdown
Collaborator

@voltjia voltjia commented May 9, 2026

Summary

  • Add pyyaml to [build-system].requires so the build can parse the generated-backend allowlist during pip install.
  • Add scripts/generate_torch_ops.py, a YAML-driven generator that reads the locally installed PyTorch / torchgen packaged native_functions.yaml.
  • Add scripts/torch_ops.yaml with 525 allowlisted ATen base op names for generated PyTorch backend coverage.
  • Integrate generated base headers and slot-8 PyTorch backends into CMake behind WITH_TORCH=ON.
  • Update scripts/generate_wrappers.py so generated base and backend files participate in Python bindings and dispatch metadata.
  • Add generated-backend tests that collect generated/torch_ops_metadata.json and execute each active PyTorch backend slot.

Motivation

InfiniOps already supports multiple native/vendor backends. This PR adds a generated PyTorch C++ backend path so a large set of ATen _out operators can be exposed through the same operator/binding surface without hand-writing hundreds of base and backend files.

The generator intentionally uses the local PyTorch installation instead of downloading native_functions.yaml: enabling WITH_TORCH already requires PyTorch to be installed, and using the local schema keeps generated code matched to the PyTorch headers and libraries being compiled.

Closes N/A.

Type of Change

  • feat — new feature / new operator / new platform
    N/A: fix — this is not a bug-fix-only PR.
    N/A: perf — no runtime hot-path optimization is intended.
    N/A: refactor — the primary change is new generated PyTorch backend support.
  • test — adds generated-backend coverage.
    N/A: docs — this is not documentation-only.
  • build / ci — integrates generation into the build.
    N/A: chore — this is not tooling-only.
    N/A: Breaking change.

Platforms Affected

  • CPU (WITH_CPU)
  • NVIDIA (WITH_NVIDIA)
  • Iluvatar (WITH_ILUVATAR)
  • MetaX (WITH_METAX)
  • Cambricon (WITH_CAMBRICON)
  • Moore (WITH_MOORE)
  • Ascend (WITH_ASCEND)
  • PyTorch C++ bindings (WITH_TORCH)
  • Build system / CMake / CI
  • Python bindings / user-facing API

Test Results on Supported Platforms

All supported platforms were validated with WITH_TORCH=ON and pytest -v; generated PyTorch backend tests were included in the collected pytest set on every platform below.

Timing columns were measured inside the test container with base image build skipped:

  • build/install = pip install .[dev] --no-build-isolation, including generated source creation and native extension compilation.
  • pytest = full pytest run.
  • total = build/install + pytest; source sync and launcher overhead are excluded.
Platform Built pytest Result build/install pytest total Notes
NVIDIA Yes 9206 passed, 8664 skipped, 104 warnings in 177.98s (0:02:57) 889s 182s 1074s Full suite passed.
Iluvatar Yes 7410 passed, 8942 skipped, 100 warnings in 150.05s (0:02:30) 629s 153s 783s Full suite passed.
MetaX Yes 8698 passed, 7654 skipped, 102 warnings in 226.55s (0:03:46) 1301s 245s 1547s Full suite passed.
Cambricon Yes 5899 passed, 10069 skipped, 317 warnings in 461.84s (0:07:41) 2055s 469s 2526s Full suite passed.
Moore Yes 8471 passed, 7899 skipped, 119 warnings in 237.29s (0:03:57) 2019s 244s 2263s Full suite passed.
Ascend Yes 7406 passed, 8904 skipped, 108 warnings in 232.62s (0:03:52) 1041s 246s 1367s Pytest summary passed; outer wrapper returned 137 after the summary.
Full `pytest` output (optional)
NVIDIA: 9206 passed, 8664 skipped, 104 warnings in 177.98s (0:02:57)
Iluvatar: 7410 passed, 8942 skipped, 100 warnings in 150.05s (0:02:30)
MetaX: 8698 passed, 7654 skipped, 102 warnings in 226.55s (0:03:46)
Cambricon: 5899 passed, 10069 skipped, 317 warnings in 461.84s (0:07:41)
Moore: 8471 passed, 7899 skipped, 119 warnings in 237.29s (0:03:57)
Ascend: 7406 passed, 8904 skipped, 108 warnings in 232.62s (0:03:52); outer wrapper exit code 137 after pytest summary

Benchmark / Performance Impact

N/A for runtime hot paths. Generated PyTorch backends call the corresponding ATen _out implementation.

The validation run above records build/install and pytest wall times to help track generated-backend build cost. Build/install time is currently the dominant cost on every platform.

The latest validation also saved verbose pytest logs and generated source trees under the local result directory ci-results/remote/torch-codegen-pr595-reviewfix-vgenerated-20260517/. The generated source trees are also archived locally as /tmp/torch-codegen-pr595-reviewfix-generated-20260517.tar.gz.

Notes for Reviewers

  • The branch was rebased onto the latest master after fix(torch): make gemm fallback portable #611 and fix(tests): run causal_softmax reference on CPU #612 merged.
  • The current stack is four commits:
    • build: add PyYAML build dependency
    • feat(scripts): add YAML-driven torch op codegen
    • build(torch): integrate generated torch backend
    • test(torch): add generated backend coverage
  • Slot 8 is reserved for generated PyTorch backends.
  • Hand-written src/base/<op>.h files continue to shadow generated base headers.
  • Optional ATen types remain hidden for now; exposing them properly is a separable follow-up.

Checklist

Title, Branch, and Commits

  • PR title follows Conventional Commits.
  • Branch name follows <type>/xxx-yyyy-zzzzfeat/torch-codegen.
  • Each commit message follows Conventional Commits.
  • Large PR with meaningful, well-formed, independently reviewable commits.
  • No stray merge commits from master — branch is rebased cleanly on top of current master.
  • No fixup! / squash! / wip commits remain.

Scope and Design

  • Changes are scoped to PyTorch backend codegen, build integration, wrapper generation, and generated-backend tests.
  • No dead code, commented-out blocks, debug prints, or ownerless TODOs were added.
  • No unrelated formatting churn that would obscure the diff.
  • Public API changes are intentional: generated infini::ops::<Op> classes and slot-8 PyTorch backends are part of this feature.

General Code Hygiene

  • The code is self-explanatory; comments were added only where the why is non-obvious.
  • Every modified or added file ends with a single trailing newline.
  • No trailing whitespace, tab/space mixing, or stray BOMs.
  • Identifiers in comments and error messages are wrapped in backticks.
  • All comments and error messages are in English.
  • Comments and error messages are complete sentences with terminal punctuation where applicable.

C++ Specific

N/A: no committed .h, .cc, .cuh, or .mlu files are modified by this PR.

Python Specific

  • ruff format --check passed for modified Python files in the CI container.
  • ruff check passed for modified Python files in the CI container.
  • GitHub ruff workflow is green on the latest pushed commit.
  • Comments are complete English sentences with backticked code references where applicable.
  • pytest.skip messages follow existing test conventions.
  • Blank-line rules around function bodies, control-flow statements, and returns were checked by ruff format --check.
  • Type hints are present on new dataclasses and public helper functions where practical.

Testing

  • pytest was run on every supported platform — see the platform table above.
  • Generated PyTorch backend tests are included in the full-suite runs.
  • New functionality has matching tests under tests/.
  • Tests use pytest.mark.parametrize for generated op metadata, shapes, dtypes, and tolerances.
  • Default dtype/device parameterization is reused where appropriate.
  • Any vendor-specific skip/crash guard is contained in the generated-backend test harness.
  • N/A: this is a feature PR rather than a bug fix, so no regression-only test is required.

Build, CI, and Tooling

  • Builds from a fresh source copy with pip install .[dev] --no-build-isolation on every supported platform.
  • compile_commands.json generation remains enabled through existing CMake configuration.
  • WITH_TORCH auto-detection was updated to use an installed Python with torch.
  • Existing CUDA-like GPU backend mutual exclusion remains in place.
  • GitHub clang-format workflow is green on the latest pushed commit.
  • GitHub ruff workflow is green on the latest pushed commit.
  • New build dependency pyyaml is added to pyproject.toml's [build-system].requires.

Documentation

N/A: no user-facing documentation file was changed.

Security and Safety

  • No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers have been committed.
  • No third-party code is vendored.
  • No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were introduced.

@voltjia voltjia requested a review from a team May 9, 2026 07:58
@voltjia voltjia force-pushed the feat/torch-codegen branch from 9cb7b73 to be71261 Compare May 15, 2026 14:51
@voltjia voltjia force-pushed the feat/torch-codegen branch from 0897fc9 to 26082cd Compare May 17, 2026 06:49
@voltjia voltjia changed the title feat: YAML-driven torch op codegen with canonical naming and exposed semantic params feat(torch): generate PyTorch backend from local ATen schemas May 17, 2026
@voltjia voltjia force-pushed the feat/torch-codegen branch from 26082cd to 0016a27 Compare May 17, 2026 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant